You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Suneel Marthi <su...@yahoo.com> on 2014/01/16 15:41:09 UTC

MAHOUT 0.9 Release - New URL

Third time's a Charm!!!


Here's the new URL for Mahout 0.9 Release:
https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/

For those volunteering to test this, some of the things to be verified:

a) Verify that u can unpack the release (tar or zip)
b) Verify u r able to compile the distro
c)  Run through the unit tests: mvn clean test
d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
     

Committers
 and PMC members:
---------------------------------------

Need 'at least 3 +1 votes' for the Release to pass. 


Thanks and Regards.

Re: MAHOUT 0.9 Release - New URL

Posted by Shannon Quinn <sq...@gatech.edu>.
a), b), and c) all pass for me. Don't have the setup yet at work to go 
through d), will wait for others to verify.

On 1/16/14, 9:41 AM, Suneel Marthi wrote:
> Third time's a Charm!!!
>
>
> Here's the new URL for Mahout 0.9 Release:
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>
> For those volunteering to test this, some of the things to be verified:
>
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c)  Run through the unit tests: mvn clean test
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
>       
>
> Committers
>   and PMC members:
> ---------------------------------------
>
> Need 'at least 3 +1 votes' for the Release to pass.
>
>
> Thanks and Regards.
>


Re: MAHOUT 0.9 Release - New URL

Posted by Suneel Marthi <su...@yahoo.com>.
Stevo,  could u test streaming kmeans?

Sent from my iPhone

> On Jan 19, 2014, at 8:10 PM, Stevo Slavić <ss...@gmail.com> wrote:
> 
> +1 (binding)
> 
> 
>> On Sun, Jan 19, 2014 at 7:49 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>> 
>> I'll try to test out soon
>> 

Re: MAHOUT 0.9 Release - New URL

Posted by Stevo Slavić <ss...@gmail.com>.
+1 (binding)


On Sun, Jan 19, 2014 at 7:49 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> I'll try to test out soon
>

Re: MAHOUT 0.9 Release - New URL

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
I'll try to test out soon

Re: MAHOUT 0.9 Release - New URL

Posted by Suneel Marthi <su...@yahoo.com>.
The reason u r seeing the error is because there are were no sequence files in HDFS in MR mode to begin with => hence no term vectors generated => and hence no vectors to cluster.

MR mode:
------------
1. Set HADOOP_HOME
2. unset MAHOUT_LOCAL
3. clean up ur local /tmp/mahout-work-xxxxx directory
4. run ./examples/bin/cluster-reuters.sh => option 4

Sequential Mode:
---------------------

1. set MAHOUT_LOCAL=true
2. Add "-xm sequential" flag to cluster-reuters.sh script
3. run ./examples/bin/cluster-reuters.sh => option 4








On Sunday, January 19, 2014 12:22 PM, Frank Scholten <fr...@frankscholten.nl> wrote:
 
When I run in MR mode I get the same problem.

See http://pastebin.com/TXJ5mQmt




On Sun, Jan 19, 2014 at 5:31 PM, Frank Scholten <fr...@frankscholten.nl> wrote:

OK, running in MR mode now.
>
>
>
>
>On Sun, Jan 19, 2014 at 5:30 PM, Suneel Marthi <su...@yahoo.com> wrote:
>
>Its presently setup to run in MR mode (the way its been coded in cluster-reuters.sh). So setting MAHOUT_LOCAL=true is gonna fail for this.
>>I am able to see this fail locally when MAHOUT_LOCAL=true. 
>>
>>
>>
>>
>>
>>
>>On Sunday, January 19, 2014 11:17 AM, Frank Scholten <fr...@frankscholten.nl> wrote:
>>
>>Exported MAHOUT_LOCAL=true and still get the same results.
>>
>>
>>
>>On Sun, Jan 19, 2014 at 5:00 PM, Suneel Marthi <su...@yahoo.com>wrote:
>>
>>> Frank,
>>>
>>> Were u running this with MAHOUT_LOCAL=true?
>>>
>>>
>>>
>>>
>>>
>>> On Sunday, January 19, 2014 10:29 AM, Frank Scholten <
>>> frank@frankscholten.nl> wrote:
>>>
>>> -1
>>>
>>> The cluster reuters example results in zero clusters when choosing
>>> streaming k-means. The other steps, unpacking and building do work.
>>>
>>> I see this stacktrace:
>>>
>>> INFO: Number of Centroids: 0
>>> Jan 19, 2014 3:51:08 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
>>> WARNING: job_local797072544_0001
>>> java.lang.IllegalArgumentException: Must have nonzero number of training
>>> and test vectors. Asked for %.1f %% of %d vectors for test
>>> [10.000000149011612, 0]
>>>     at
>>> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
>>>     at
>>> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
>>>     at
>>> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
>>>     at
>>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
>>>     at
>>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
>>>     at
>>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
>>>     at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
>>>     at
>>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>>>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>>>     at
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
>>>
>>> Num clusters: 0; maxDistance: 0.000000
>>> [Dunn Index] First: Infinity
>>> [Davies-Bouldin Index] First: NaN
>>> Jan 19, 2014 3:51:09 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 278 ms (Minutes: 0.004633333333333333)
>>> cluster,distance.mean,distance.sd
>>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>>>
>>>
>>> Here is the full log: http://pastebin.com/TxLV0rDr
>>>
>>> As of  yet I am  unfamiliar with the streaming k-means code and the
>>> algorithms behind it. If anyone has suggestion on what goes wrong in the
>>> code I am I happy to help  where I can.
>>>
>>>
>>> Frank
>>>
>>>
>>>
>>> On Sun, Jan 19, 2014 at 10:55 AM, Suneel Marthi <su...@yahoo.com>
>>> wrote:
>>>
>>> Thanks Grant.
>>> >
>>> >Not sure if I can vote given my role as the BuildMeister/ReleaseMeister
>>> for 0.9.
>>> >Here's my +1 FWIW.
>>> >
>>> >a) Attached is the draft of the Release notes for 0.9, would definitely
>>> appreciate feedback on that.
>>> >
>>> >b) The vote is open until Monday, Jan 20, 2014 11:59PM EST and passes if
>>> a majority of atleast 3 +1 PMC votes are cast.
>>> >
>>> >The release files, including signatures, digests, etc can be found at:
>>> >
>>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>>> >
>>> >The staging repository for this release can be found at:
>>> >https://repository.apache.org/content/repositories/orgapachemahout-1002
>>> >
>>> >Release artifacts have been signed with the following key:
>>> >https://people.apache.org/keys/committer/smarthi.asc
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >On Saturday, January 18, 2014 12:27 PM, Grant Ingersoll <
>>> gsingers@apache.org> wrote:
>>> >
>>> >Ran the tests, verified sigs, tried out a few of the examples.
>>> >
>>> >+1 (binding)
>>> >
>>> >
>>> >On Jan 16, 2014, at 9:41 AM, Suneel Marthi <su...@yahoo.com>
>>> wrote:
>>> >
>>> >> Third time's a Charm!!!
>>> >>
>>> >>
>>> >> Here's the new URL for Mahout 0.9 Release:
>>> >>
>>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>>> >>
>>> >> For those volunteering to test this, some of the things to be verified:
>>> >>
>>> >> a) Verify that u can
>>>  unpack the release (tar or zip)
>>> >> b) Verify u r able to compile the distro
>>> >> c)  Run through the unit tests: mvn clean test
>>> >> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
>>> through all the different options in each script.
>>> >>
>>> >>
>>> >> Committers
>>> >> and PMC members:
>>> >> ---------------------------------------
>>> >>
>>> >> Need 'at least 3 +1 votes' for the Release to pass.
>>> >>
>>> >>
>>> >> Thanks and Regards.
>>> >
>>> >
>>> >
>>> >
>>>
>

Re: MAHOUT 0.9 Release - New URL

Posted by Frank Scholten <fr...@frankscholten.nl>.
When I run in MR mode I get the same problem.

See http://pastebin.com/TXJ5mQmt


On Sun, Jan 19, 2014 at 5:31 PM, Frank Scholten <fr...@frankscholten.nl>wrote:

> OK, running in MR mode now.
>
>
> On Sun, Jan 19, 2014 at 5:30 PM, Suneel Marthi <su...@yahoo.com>wrote:
>
>> Its presently setup to run in MR mode (the way its been coded in
>> cluster-reuters.sh). So setting MAHOUT_LOCAL=true is gonna fail for this.
>> I am able to see this fail locally when MAHOUT_LOCAL=true.
>>
>>
>>
>>
>>
>> On Sunday, January 19, 2014 11:17 AM, Frank Scholten <
>> frank@frankscholten.nl> wrote:
>>
>> Exported MAHOUT_LOCAL=true and still get the same results.
>>
>>
>>
>> On Sun, Jan 19, 2014 at 5:00 PM, Suneel Marthi <suneel_marthi@yahoo.com
>> >wrote:
>>
>> > Frank,
>> >
>> > Were u running this with MAHOUT_LOCAL=true?
>> >
>> >
>> >
>> >
>> >
>> > On Sunday, January 19, 2014 10:29 AM, Frank Scholten <
>> > frank@frankscholten.nl> wrote:
>> >
>> > -1
>> >
>> > The cluster reuters example results in zero clusters when choosing
>> > streaming k-means. The other steps, unpacking and building do work.
>> >
>> > I see this stacktrace:
>> >
>> > INFO: Number of Centroids: 0
>> > Jan 19, 2014 3:51:08 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
>> > WARNING: job_local797072544_0001
>> > java.lang.IllegalArgumentException: Must have nonzero number of training
>> > and test vectors. Asked for %.1f %% of %d vectors for test
>> > [10.000000149011612, 0]
>> >     at
>> >
>> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
>> >     at
>> >
>> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
>> >     at
>> >
>> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
>> >     at
>> >
>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
>> >     at
>> >
>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
>> >     at
>> >
>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
>> >     at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
>> >     at
>> > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>> >     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>> >     at
>> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
>> >
>> > Num clusters: 0; maxDistance: 0.000000
>> > [Dunn Index] First: Infinity
>> > [Davies-Bouldin Index] First: NaN
>> > Jan 19, 2014 3:51:09 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: Program took 278 ms (Minutes: 0.004633333333333333)
>> > cluster,distance.mean,distance.sd
>> >
>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>> >
>> >
>> > Here is the full log: http://pastebin.com/TxLV0rDr
>> >
>> > As of  yet I am  unfamiliar with the streaming k-means code and the
>> > algorithms behind it. If anyone has suggestion on what goes wrong in the
>> > code I am I happy to help  where I can.
>> >
>> >
>> > Frank
>> >
>> >
>> >
>> > On Sun, Jan 19, 2014 at 10:55 AM, Suneel Marthi <
>> suneel_marthi@yahoo.com>
>> > wrote:
>> >
>> > Thanks Grant.
>> > >
>> > >Not sure if I can vote given my role as the BuildMeister/ReleaseMeister
>> > for 0.9.
>> > >Here's my +1 FWIW.
>> > >
>> > >a) Attached is the draft of the Release notes for 0.9, would definitely
>> > appreciate feedback on that.
>> > >
>> > >b) The vote is open until Monday, Jan 20, 2014 11:59PM EST and passes
>> if
>> > a majority of atleast 3 +1 PMC votes are cast.
>> > >
>> > >The release files, including signatures, digests, etc can be found at:
>> > >
>> >
>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>> > >
>> > >The staging repository for this release can be found at:
>> > >
>> https://repository.apache.org/content/repositories/orgapachemahout-1002
>> > >
>> > >Release artifacts have been signed with the following key:
>> > >https://people.apache.org/keys/committer/smarthi.asc
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >On Saturday, January 18, 2014 12:27 PM, Grant Ingersoll <
>> > gsingers@apache.org> wrote:
>> > >
>> > >Ran the tests, verified sigs, tried out a few of the examples.
>> > >
>> > >+1 (binding)
>> > >
>> > >
>> > >On Jan 16, 2014, at 9:41 AM, Suneel Marthi <su...@yahoo.com>
>> > wrote:
>> > >
>> > >> Third time's a Charm!!!
>> > >>
>> > >>
>> > >> Here's the new URL for Mahout 0.9 Release:
>> > >>
>> >
>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>> > >>
>> > >> For those volunteering to test this, some of the things to be
>> verified:
>> > >>
>> > >> a) Verify that u can
>> >  unpack the release (tar or zip)
>> > >> b) Verify u r able to compile the distro
>> > >> c)  Run through the unit tests: mvn clean test
>> > >> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please
>> run
>> > through all the different options in each script.
>> > >>
>> > >>
>> > >> Committers
>> > >> and PMC members:
>> > >> ---------------------------------------
>> > >>
>> > >> Need 'at least 3 +1 votes' for the Release to pass.
>> > >>
>> > >>
>> > >> Thanks and Regards.
>> > >
>> > >
>> > >
>> > >
>> >
>>
>
>

Re: MAHOUT 0.9 Release - New URL

Posted by Frank Scholten <fr...@frankscholten.nl>.
OK, running in MR mode now.


On Sun, Jan 19, 2014 at 5:30 PM, Suneel Marthi <su...@yahoo.com>wrote:

> Its presently setup to run in MR mode (the way its been coded in
> cluster-reuters.sh). So setting MAHOUT_LOCAL=true is gonna fail for this.
> I am able to see this fail locally when MAHOUT_LOCAL=true.
>
>
>
>
>
> On Sunday, January 19, 2014 11:17 AM, Frank Scholten <
> frank@frankscholten.nl> wrote:
>
> Exported MAHOUT_LOCAL=true and still get the same results.
>
>
>
> On Sun, Jan 19, 2014 at 5:00 PM, Suneel Marthi <suneel_marthi@yahoo.com
> >wrote:
>
> > Frank,
> >
> > Were u running this with MAHOUT_LOCAL=true?
> >
> >
> >
> >
> >
> > On Sunday, January 19, 2014 10:29 AM, Frank Scholten <
> > frank@frankscholten.nl> wrote:
> >
> > -1
> >
> > The cluster reuters example results in zero clusters when choosing
> > streaming k-means. The other steps, unpacking and building do work.
> >
> > I see this stacktrace:
> >
> > INFO: Number of Centroids: 0
> > Jan 19, 2014 3:51:08 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
> > WARNING: job_local797072544_0001
> > java.lang.IllegalArgumentException: Must have nonzero number of training
> > and test vectors. Asked for %.1f %% of %d vectors for test
> > [10.000000149011612, 0]
> >     at
> >
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
> >     at
> >
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
> >     at
> >
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
> >     at
> >
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
> >     at
> >
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
> >     at
> >
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
> >     at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
> >     at
> > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
> >     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
> >     at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
> >
> > Num clusters: 0; maxDistance: 0.000000
> > [Dunn Index] First: Infinity
> > [Davies-Bouldin Index] First: NaN
> > Jan 19, 2014 3:51:09 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: Program took 278 ms (Minutes: 0.004633333333333333)
> > cluster,distance.mean,distance.sd
> >
> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> >
> >
> > Here is the full log: http://pastebin.com/TxLV0rDr
> >
> > As of  yet I am  unfamiliar with the streaming k-means code and the
> > algorithms behind it. If anyone has suggestion on what goes wrong in the
> > code I am I happy to help  where I can.
> >
> >
> > Frank
> >
> >
> >
> > On Sun, Jan 19, 2014 at 10:55 AM, Suneel Marthi <suneel_marthi@yahoo.com
> >
> > wrote:
> >
> > Thanks Grant.
> > >
> > >Not sure if I can vote given my role as the BuildMeister/ReleaseMeister
> > for 0.9.
> > >Here's my +1 FWIW.
> > >
> > >a) Attached is the draft of the Release notes for 0.9, would definitely
> > appreciate feedback on that.
> > >
> > >b) The vote is open until Monday, Jan 20, 2014 11:59PM EST and passes if
> > a majority of atleast 3 +1 PMC votes are cast.
> > >
> > >The release files, including signatures, digests, etc can be found at:
> > >
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> > >
> > >The staging repository for this release can be found at:
> > >https://repository.apache.org/content/repositories/orgapachemahout-1002
> > >
> > >Release artifacts have been signed with the following key:
> > >https://people.apache.org/keys/committer/smarthi.asc
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >On Saturday, January 18, 2014 12:27 PM, Grant Ingersoll <
> > gsingers@apache.org> wrote:
> > >
> > >Ran the tests, verified sigs, tried out a few of the examples.
> > >
> > >+1 (binding)
> > >
> > >
> > >On Jan 16, 2014, at 9:41 AM, Suneel Marthi <su...@yahoo.com>
> > wrote:
> > >
> > >> Third time's a Charm!!!
> > >>
> > >>
> > >> Here's the new URL for Mahout 0.9 Release:
> > >>
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> > >>
> > >> For those volunteering to test this, some of the things to be
> verified:
> > >>
> > >> a) Verify that u can
> >  unpack the release (tar or zip)
> > >> b) Verify u r able to compile the distro
> > >> c)  Run through the unit tests: mvn clean test
> > >> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> > through all the different options in each script.
> > >>
> > >>
> > >> Committers
> > >> and PMC members:
> > >> ---------------------------------------
> > >>
> > >> Need 'at least 3 +1 votes' for the Release to pass.
> > >>
> > >>
> > >> Thanks and Regards.
> > >
> > >
> > >
> > >
> >
>

Re: MAHOUT 0.9 Release - New URL

Posted by Suneel Marthi <su...@yahoo.com>.
It works when both MAHOUT_LOCAL=true and '-xm sequential' option are set.
Guess will have to cut a release again with '-xm sequential' option set. 




On Sunday, January 19, 2014 11:31 AM, Suneel Marthi <su...@yahoo.com> wrote:
 
Its presently setup to run in MR mode (the way its been coded in cluster-reuters.sh). So setting MAHOUT_LOCAL=true is gonna fail for this.
I am able to see this fail locally when MAHOUT_LOCAL=true.  






On Sunday, January 19, 2014 11:17 AM, Frank Scholten <fr...@frankscholten.nl> wrote:

Exported MAHOUT_LOCAL=true and still get the same results.



On Sun, Jan 19, 2014 at 5:00 PM, Suneel Marthi <su...@yahoo.com>wrote:

> Frank,
>
> Were u running this with MAHOUT_LOCAL=true?
>
>
>
>
>
> On Sunday, January 19, 2014 10:29 AM, Frank Scholten <
> frank@frankscholten.nl> wrote:
>
> -1
>
> The cluster reuters example results in zero clusters when choosing
> streaming k-means. The other steps, unpacking and building do work.
>
> I see this stacktrace:
>
> INFO: Number of Centroids: 0
> Jan 19, 2014 3:51:08 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
> WARNING: job_local797072544_0001
> java.lang.IllegalArgumentException: Must have nonzero number of training
> and test
 vectors. Asked for %.1f %% of %d vectors for test
> [10.000000149011612, 0]
>     at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
>     at
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
>     at
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
>     at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
>     at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
>     at
>
 org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
>     at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
>     at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>     at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
>
> Num clusters: 0; maxDistance: 0.000000
> [Dunn Index] First: Infinity
> [Davies-Bouldin Index] First: NaN
> Jan 19, 2014 3:51:09 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 278 ms (Minutes: 0.004633333333333333)
>
 cluster,distance.mean,distance.sd
> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>
>
> Here is the full log: http://pastebin.com/TxLV0rDr
>
> As of  yet I am  unfamiliar with the streaming k-means code and the
> algorithms behind it. If anyone has suggestion on what goes wrong in the
> code I am I happy to help  where I can.
>
>
> Frank
>
>
>
> On Sun, Jan 19, 2014 at 10:55 AM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> Thanks Grant.
> >
> >Not sure if I can vote given my role as the BuildMeister/ReleaseMeister
> for 0.9.
> >Here's my +1 FWIW.
> >
> >a) Attached is the draft of the Release notes for 0.9, would definitely
> appreciate feedback on that.
> >
> >b) The vote is open until Monday, Jan 20, 2014 11:59PM EST and passes if
> a majority of atleast 3 +1 PMC votes are cast.
> >
> >The release files, including signatures, digests, etc can be found at:
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> >
> >The staging repository for this release can be found at:
> >https://repository.apache.org/content/repositories/orgapachemahout-1002
> >
> >Release artifacts have been signed with the following key:
> >https://people.apache.org/keys/committer/smarthi.asc
> >
> >
> >
> >
> >
> >
> >
> >
>
 >On Saturday, January 18, 2014 12:27 PM, Grant Ingersoll <
> gsingers@apache.org> wrote:
> >
> >Ran the tests, verified sigs, tried out a few of the examples.
> >
> >+1 (binding)
> >
> >
> >On Jan 16, 2014, at 9:41 AM, Suneel Marthi <su...@yahoo.com>
> wrote:
> >
> >> Third time's a Charm!!!
> >>
> >>
> >> Here's the new URL for Mahout 0.9 Release:
>
 >>
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> >>
> >> For those volunteering to test this, some of the things to be verified:
> >>
> >> a) Verify that u can
>  unpack the release (tar or zip)
> >> b) Verify u r able to compile the distro
> >> c)  Run through the unit tests: mvn clean test
> >> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> through all the different options in each script.
> >>
>
 >>
> >> Committers
> >> and PMC members:
> >> ---------------------------------------
> >>
> >> Need 'at least 3 +1 votes' for the Release to pass.
> >>
> >>
> >> Thanks and Regards.
> >
> >
> >
> >
>

Re: MAHOUT 0.9 Release - New URL

Posted by Suneel Marthi <su...@yahoo.com>.
Its presently setup to run in MR mode (the way its been coded in cluster-reuters.sh). So setting MAHOUT_LOCAL=true is gonna fail for this.
I am able to see this fail locally when MAHOUT_LOCAL=true.  





On Sunday, January 19, 2014 11:17 AM, Frank Scholten <fr...@frankscholten.nl> wrote:
 
Exported MAHOUT_LOCAL=true and still get the same results.



On Sun, Jan 19, 2014 at 5:00 PM, Suneel Marthi <su...@yahoo.com>wrote:

> Frank,
>
> Were u running this with MAHOUT_LOCAL=true?
>
>
>
>
>
> On Sunday, January 19, 2014 10:29 AM, Frank Scholten <
> frank@frankscholten.nl> wrote:
>
> -1
>
> The cluster reuters example results in zero clusters when choosing
> streaming k-means. The other steps, unpacking and building do work.
>
> I see this stacktrace:
>
> INFO: Number of Centroids: 0
> Jan 19, 2014 3:51:08 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
> WARNING: job_local797072544_0001
> java.lang.IllegalArgumentException: Must have nonzero number of training
> and test vectors. Asked for %.1f %% of %d vectors for test
> [10.000000149011612, 0]
>     at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
>     at
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
>     at
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
>     at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
>     at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
>     at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
>     at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
>     at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>     at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
>
> Num clusters: 0; maxDistance: 0.000000
> [Dunn Index] First: Infinity
> [Davies-Bouldin Index] First: NaN
> Jan 19, 2014 3:51:09 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 278 ms (Minutes: 0.004633333333333333)
> cluster,distance.mean,distance.sd
> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>
>
> Here is the full log: http://pastebin.com/TxLV0rDr
>
> As of  yet I am  unfamiliar with the streaming k-means code and the
> algorithms behind it. If anyone has suggestion on what goes wrong in the
> code I am I happy to help  where I can.
>
>
> Frank
>
>
>
> On Sun, Jan 19, 2014 at 10:55 AM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> Thanks Grant.
> >
> >Not sure if I can vote given my role as the BuildMeister/ReleaseMeister
> for 0.9.
> >Here's my +1 FWIW.
> >
> >a) Attached is the draft of the Release notes for 0.9, would definitely
> appreciate feedback on that.
> >
> >b) The vote is open until Monday, Jan 20, 2014 11:59PM EST and passes if
> a majority of atleast 3 +1 PMC votes are cast.
> >
> >The release files, including signatures, digests, etc can be found at:
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> >
> >The staging repository for this release can be found at:
> >https://repository.apache.org/content/repositories/orgapachemahout-1002
> >
> >Release artifacts have been signed with the following key:
> >https://people.apache.org/keys/committer/smarthi.asc
> >
> >
> >
> >
> >
> >
> >
> >
> >On Saturday, January 18, 2014 12:27 PM, Grant Ingersoll <
> gsingers@apache.org> wrote:
> >
> >Ran the tests, verified sigs, tried out a few of the examples.
> >
> >+1 (binding)
> >
> >
> >On Jan 16, 2014, at 9:41 AM, Suneel Marthi <su...@yahoo.com>
> wrote:
> >
> >> Third time's a Charm!!!
> >>
> >>
> >> Here's the new URL for Mahout 0.9 Release:
> >>
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> >>
> >> For those volunteering to test this, some of the things to be verified:
> >>
> >> a) Verify that u can
>  unpack the release (tar or zip)
> >> b) Verify u r able to compile the distro
> >> c)  Run through the unit tests: mvn clean test
> >> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> through all the different options in each script.
> >>
> >>
> >> Committers
> >> and PMC members:
> >> ---------------------------------------
> >>
> >> Need 'at least 3 +1 votes' for the Release to pass.
> >>
> >>
> >> Thanks and Regards.
> >
> >
> >
> >
>

Re: MAHOUT 0.9 Release - New URL

Posted by Frank Scholten <fr...@frankscholten.nl>.
Exported MAHOUT_LOCAL=true and still get the same results.


On Sun, Jan 19, 2014 at 5:00 PM, Suneel Marthi <su...@yahoo.com>wrote:

> Frank,
>
> Were u running this with MAHOUT_LOCAL=true?
>
>
>
>
>
> On Sunday, January 19, 2014 10:29 AM, Frank Scholten <
> frank@frankscholten.nl> wrote:
>
> -1
>
> The cluster reuters example results in zero clusters when choosing
> streaming k-means. The other steps, unpacking and building do work.
>
> I see this stacktrace:
>
> INFO: Number of Centroids: 0
> Jan 19, 2014 3:51:08 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
> WARNING: job_local797072544_0001
> java.lang.IllegalArgumentException: Must have nonzero number of training
> and test vectors. Asked for %.1f %% of %d vectors for test
> [10.000000149011612, 0]
>     at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
>     at
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
>     at
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
>     at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
>     at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
>     at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
>     at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
>     at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>     at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
>
> Num clusters: 0; maxDistance: 0.000000
> [Dunn Index] First: Infinity
> [Davies-Bouldin Index] First: NaN
> Jan 19, 2014 3:51:09 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 278 ms (Minutes: 0.004633333333333333)
> cluster,distance.mean,distance.sd
> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>
>
> Here is the full log: http://pastebin.com/TxLV0rDr
>
> As of  yet I am  unfamiliar with the streaming k-means code and the
> algorithms behind it. If anyone has suggestion on what goes wrong in the
> code I am I happy to help  where I can.
>
>
> Frank
>
>
>
> On Sun, Jan 19, 2014 at 10:55 AM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> Thanks Grant.
> >
> >Not sure if I can vote given my role as the BuildMeister/ReleaseMeister
> for 0.9.
> >Here's my +1 FWIW.
> >
> >a) Attached is the draft of the Release notes for 0.9, would definitely
> appreciate feedback on that.
> >
> >b) The vote is open until Monday, Jan 20, 2014 11:59PM EST and passes if
> a majority of atleast 3 +1 PMC votes are cast.
> >
> >The release files, including signatures, digests, etc can be found at:
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> >
> >The staging repository for this release can be found at:
> >https://repository.apache.org/content/repositories/orgapachemahout-1002
> >
> >Release artifacts have been signed with the following key:
> >https://people.apache.org/keys/committer/smarthi.asc
> >
> >
> >
> >
> >
> >
> >
> >
> >On Saturday, January 18, 2014 12:27 PM, Grant Ingersoll <
> gsingers@apache.org> wrote:
> >
> >Ran the tests, verified sigs, tried out a few of the examples.
> >
> >+1 (binding)
> >
> >
> >On Jan 16, 2014, at 9:41 AM, Suneel Marthi <su...@yahoo.com>
> wrote:
> >
> >> Third time's a Charm!!!
> >>
> >>
> >> Here's the new URL for Mahout 0.9 Release:
> >>
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> >>
> >> For those volunteering to test this, some of the things to be verified:
> >>
> >> a) Verify that u can
>  unpack the release (tar or zip)
> >> b) Verify u r able to compile the distro
> >> c)  Run through the unit tests: mvn clean test
> >> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> through all the different options in each script.
> >>
> >>
> >> Committers
> >> and PMC members:
> >> ---------------------------------------
> >>
> >> Need 'at least 3 +1 votes' for the Release to pass.
> >>
> >>
> >> Thanks and Regards.
> >
> >
> >
> >
>

Re: MAHOUT 0.9 Release - New URL

Posted by Suneel Marthi <su...@yahoo.com>.
Frank,

Were u running this with MAHOUT_LOCAL=true?





On Sunday, January 19, 2014 10:29 AM, Frank Scholten <fr...@frankscholten.nl> wrote:
 
-1

The cluster reuters example results in zero clusters when choosing streaming k-means. The other steps, unpacking and building do work.

I see this stacktrace:

INFO: Number of Centroids: 0
Jan 19, 2014 3:51:08 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
WARNING: job_local797072544_0001
java.lang.IllegalArgumentException: Must have nonzero number of training and test vectors. Asked for %.1f %% of %d vectors for test [10.000000149011612, 0]
    at com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
    at org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
    at org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
    at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
    at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
    at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)

Num clusters: 0; maxDistance: 0.000000
[Dunn Index] First: Infinity
[Davies-Bouldin Index] First: NaN
Jan 19, 2014 3:51:09 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 278 ms (Minutes: 0.004633333333333333)
cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train


Here is the full log: http://pastebin.com/TxLV0rDr

As of  yet I am  unfamiliar with the streaming k-means code and the algorithms behind it. If anyone has suggestion on what goes wrong in the code I am I happy to help  where I can.


Frank



On Sun, Jan 19, 2014 at 10:55 AM, Suneel Marthi <su...@yahoo.com> wrote:

Thanks Grant. 
>
>Not sure if I can vote given my role as the BuildMeister/ReleaseMeister for 0.9. 
>Here's my +1 FWIW.
>
>a) Attached is the draft of the Release notes for 0.9, would definitely appreciate feedback on that.
>
>b) The vote is open until Monday, Jan 20, 2014 11:59PM EST and passes if a majority of atleast 3 +1 PMC votes are cast.
>
>The release files, including signatures, digests, etc can be found at:
>https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>
>The staging repository for this release can be found at:
>https://repository.apache.org/content/repositories/orgapachemahout-1002
>
>Release artifacts have been signed with the following key:
>https://people.apache.org/keys/committer/smarthi.asc
>
>
>
>
>
>
>
>
>On Saturday, January 18, 2014 12:27 PM, Grant Ingersoll <gs...@apache.org> wrote:
> 
>Ran the tests, verified sigs, tried out a few of the examples.
>
>+1 (binding)
>
>
>On Jan 16, 2014, at 9:41 AM, Suneel Marthi <su...@yahoo.com> wrote:
>
>> Third time's a Charm!!!
>> 
>> 
>> Here's the new URL for Mahout 0.9 Release:
>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>> 
>> For those volunteering to test this, some of the things to be verified:
>> 
>> a) Verify that u can
 unpack the release (tar or zip)
>> b) Verify u r able to compile the distro
>> c)  Run through the unit tests: mvn clean test
>> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
>>      
>> 
>> Committers
>> and PMC members:
>> ---------------------------------------
>> 
>> Need 'at least 3 +1 votes' for the Release to pass. 
>> 
>> 
>> Thanks and Regards.
>
>
>
>

Re: MAHOUT 0.9 Release - New URL

Posted by Frank Scholten <fr...@frankscholten.nl>.
-1

The cluster reuters example results in zero clusters when choosing
streaming k-means. The other steps, unpacking and building do work.

I see this stacktrace:

INFO: Number of Centroids: 0
Jan 19, 2014 3:51:08 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
WARNING: job_local797072544_0001
java.lang.IllegalArgumentException: Must have nonzero number of training
and test vectors. Asked for %.1f %% of %d vectors for test
[10.000000149011612, 0]
    at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
    at
org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
    at
org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
    at
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
    at
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
    at
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
    at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
    at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)

Num clusters: 0; maxDistance: 0.000000
[Dunn Index] First: Infinity
[Davies-Bouldin Index] First: NaN
Jan 19, 2014 3:51:09 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 278 ms (Minutes: 0.004633333333333333)
cluster,distance.mean,distance.sd
,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train

Here is the full log: http://pastebin.com/TxLV0rDr

As of yet I am unfamiliar with the streaming k-means code and the
algorithms behind it. If anyone has suggestion on what goes wrong in the
code I am I happy to help where I can.

Frank

On Sun, Jan 19, 2014 at 10:55 AM, Suneel Marthi <su...@yahoo.com>wrote:

> Thanks Grant.
>
> Not sure if I can vote given my role as the BuildMeister/ReleaseMeister
> for 0.9.
> Here's my +1 FWIW.
>
> a) Attached is the draft of the Release notes for 0.9, would definitely
> appreciate feedback on that.
>
> b) The vote is open until Monday, Jan 20, 2014 11:59PM EST and passes if a
> majority of atleast 3 +1 PMC votes are cast.
>
> The release files, including signatures, digests, etc can be found at:
>
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachemahout-1002<https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/>
>
> Release artifacts have been signed with the following key:
> https://people.apache.org/keys/committer/smarthi.asc<https://people.apache.org/keys/committer/pwendell.asc>
>
>
>
>
>
>
>   On Saturday, January 18, 2014 12:27 PM, Grant Ingersoll <
> gsingers@apache.org> wrote:
>  Ran the tests, verified sigs, tried out a few of the examples.
>
> +1 (binding)
>
> On Jan 16, 2014, at 9:41 AM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> > Third time's a Charm!!!
> >
> >
> > Here's the new URL for Mahout 0.9 Release:
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> >
> > For those volunteering to test this, some of the things to be verified:
> >
> > a) Verify that u can unpack the release (tar or zip)
> > b) Verify u r able to compile the distro
> > c)  Run through the unit tests: mvn clean test
> > d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> through all the different options in each script.
> >
> >
> > Committers
> > and PMC members:
> > ---------------------------------------
> >
> > Need 'at least 3 +1 votes' for the Release to pass.
> >
> >
> > Thanks and Regards.
>
>
>
>

Re: MAHOUT 0.9 Release - New URL

Posted by Suneel Marthi <su...@yahoo.com>.
Thanks Grant. 

Not sure if I can vote given my role as the BuildMeister/ReleaseMeister for 0.9. 
Here's my +1 FWIW.

a) Attached is the draft of the Release notes for 0.9, would definitely appreciate feedback on that.

b) The vote is open until Monday, Jan 20, 2014 11:59PM EST and passes if a majority of atleast 3 +1 PMC votes are cast.

The release files, including signatures, digests, etc can be found at:
https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachemahout-1002

Release artifacts have been signed with the following key:
https://people.apache.org/keys/committer/smarthi.asc







On Saturday, January 18, 2014 12:27 PM, Grant Ingersoll <gs...@apache.org> wrote:
 
Ran the tests, verified sigs, tried out a few of the examples.

+1 (binding)


On Jan 16, 2014, at 9:41 AM, Suneel Marthi <su...@yahoo.com> wrote:

> Third time's a Charm!!!
> 
> 
> Here's the new URL for Mahout 0.9 Release:
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> 
> For those volunteering to test this, some of the things to be verified:
> 
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c)  Run through the unit tests: mvn clean test
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
>      
> 
> Committers
> and PMC members:
> ---------------------------------------
> 
> Need 'at least 3 +1 votes' for the Release to pass. 
> 
> 
> Thanks and Regards.

Re: MAHOUT 0.9 Release - New URL

Posted by Grant Ingersoll <gs...@apache.org>.
Ran the tests, verified sigs, tried out a few of the examples.

+1 (binding)

On Jan 16, 2014, at 9:41 AM, Suneel Marthi <su...@yahoo.com> wrote:

> Third time's a Charm!!!
> 
> 
> Here's the new URL for Mahout 0.9 Release:
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> 
> For those volunteering to test this, some of the things to be verified:
> 
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c)  Run through the unit tests: mvn clean test
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
>      
> 
> Committers
> and PMC members:
> ---------------------------------------
> 
> Need 'at least 3 +1 votes' for the Release to pass. 
> 
> 
> Thanks and Regards.



Re: MAHOUT 0.9 Release - New URL

Posted by Shannon Quinn <sq...@gatech.edu>.
OS X 10.9.1, java version 1.6.0_65.

On 1/16/14, 10:41 AM, Sergey Svinarchuk wrote:
> I tested mahout 0.9 on Ubuntu 12.04 64bit, java version "1.6.0_27"
>
> a) Verify that u can unpack the release (tar or zip) - passed
> b) Verify u r able to compile the distro - passed
> c)  Run through the unit tests: mvn clean test -passed
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> through all the different options in each script. - will update later
>
>
> On Thu, Jan 16, 2014 at 5:35 PM, Sotiris Salloumis <in...@eprice.gr> wrote:
>
>> Hi Suneel,
>>
>> Below first round of tests,
>>
>> Environment: SMP Debian 3.2.51-1 x86_64
>> Machine: Intel(R) Core(TM) i7 CPU         950  @ 3.07GHz stepping 05 12GB
>> RAM
>> OpenJDK: javac 1.6.0_27
>>
>> a) Verify that u can unpack the release (tar or zip)  [ Passed: tar -zxvf ]
>> b) Verify u r able to compile the distro  [ Passed: With OpenJDK, Latest
>> Maven on LatestDebian ]
>> c)  Run through the unit tests: mvn clean test [ Passed: 370 milliseconds]
>>
>> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
>> through all the different options in each script. [Ongoing will update
>> later]
>>
>> Regards
>> Sotiris
>>
>> -----Original Message-----
>> From: Suneel Marthi [mailto:suneel_marthi@yahoo.com]
>> Sent: Thursday, January 16, 2014 4:41 PM
>> To: user@mahout.apache.org; mahout
>> Subject: MAHOUT 0.9 Release - New URL
>>
>> Third time's a Charm!!!
>>
>>
>> Here's the new URL for Mahout 0.9 Release:
>>
>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/
>> apache/mahout/mahout-distribution/0.9/
>>
>> For those volunteering to test this, some of the things to be verified:
>>
>> a) Verify that u can unpack the release (tar or zip)
>> b) Verify u r able to compile the distro
>> c)  Run through the unit tests: mvn clean test
>> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
>> through all the different options in each script.
>>
>>
>> Committers
>>   and PMC members:
>> ---------------------------------------
>>
>> Need 'at least 3 +1 votes' for the Release to pass.
>>
>>
>> Thanks and Regards.
>>
>>


Re: MAHOUT 0.9 Release - New URL

Posted by Sergey Svinarchuk <ss...@hortonworks.com>.
I tested mahout 0.9 on Ubuntu 12.04 64bit, java version "1.6.0_27"

a) Verify that u can unpack the release (tar or zip) - passed
b) Verify u r able to compile the distro - passed
c)  Run through the unit tests: mvn clean test -passed
d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
through all the different options in each script. - will update later


On Thu, Jan 16, 2014 at 5:35 PM, Sotiris Salloumis <in...@eprice.gr> wrote:

> Hi Suneel,
>
> Below first round of tests,
>
> Environment: SMP Debian 3.2.51-1 x86_64
> Machine: Intel(R) Core(TM) i7 CPU         950  @ 3.07GHz stepping 05 12GB
> RAM
> OpenJDK: javac 1.6.0_27
>
> a) Verify that u can unpack the release (tar or zip)  [ Passed: tar -zxvf ]
> b) Verify u r able to compile the distro  [ Passed: With OpenJDK, Latest
> Maven on LatestDebian ]
> c)  Run through the unit tests: mvn clean test [ Passed: 370 milliseconds]
>
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> through all the different options in each script. [Ongoing will update
> later]
>
> Regards
> Sotiris
>
> -----Original Message-----
> From: Suneel Marthi [mailto:suneel_marthi@yahoo.com]
> Sent: Thursday, January 16, 2014 4:41 PM
> To: user@mahout.apache.org; mahout
> Subject: MAHOUT 0.9 Release - New URL
>
> Third time's a Charm!!!
>
>
> Here's the new URL for Mahout 0.9 Release:
>
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/
> apache/mahout/mahout-distribution/0.9/
>
> For those volunteering to test this, some of the things to be verified:
>
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c)  Run through the unit tests: mvn clean test
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> through all the different options in each script.
>
>
> Committers
>  and PMC members:
> ---------------------------------------
>
> Need 'at least 3 +1 votes' for the Release to pass.
>
>
> Thanks and Regards.
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

RE: MAHOUT 0.9 Release - New URL

Posted by Sotiris Salloumis <in...@eprice.gr>.
Sorry my mistake milliseconds was the last test … below the full results 

 

~/mahout/apache-maven-3.1.1/bin/mvn -DskipTests clean install  

 

[INFO] BUILD SUCCESS

[INFO] ------------------------------------------------------------------------

[INFO] Total time: 52.312s

[INFO] Finished at: Sat Jan 18 02:04:29 CET 2014

[INFO] Final Memory: 46M/305M

[INFO] ------------------------------------------------------------------------

 

~/mahout/apache-maven-3.1.1/bin/mvn clean test

 

[INFO] ------------------------------------------------------------------------

[INFO] Reactor Summary:

[INFO]

[INFO] Mahout Build Tools ................................ SUCCESS [1.166s]

[INFO] Apache Mahout ..................................... SUCCESS [0.264s]

[INFO] Mahout Math ....................................... SUCCESS [58.639s]

[INFO] Mahout Core ....................................... SUCCESS [4:01.640s]

[INFO] Mahout Integration ................................ SUCCESS [21.481s]

[INFO] Mahout Examples ................................... SUCCESS [1.980s]

[INFO] Mahout Release Package ............................ SUCCESS [0.003s]

[INFO] Mahout Math/Scala wrappers ........................ SUCCESS [14.149s]

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

[INFO] ------------------------------------------------------------------------

[INFO] Total time: 5:39.563s

[INFO] Finished at: Sat Jan 18 02:10:53 CET 2014

[INFO] Final Memory: 51M/1068M

[INFO] ------------------------------------------------------------------------

 

From: Ted Dunning [mailto:ted.dunning@gmail.com] 
Sent: Saturday, January 18, 2014 2:50 AM
To: Mahout Dev List; Sotiris Salloumis
Cc: Suneel Marthi; user@mahout.apache.org
Subject: Re: MAHOUT 0.9 Release - New URL

 

 

On Thu, Jan 16, 2014 at 7:35 AM, Sotiris Salloumis <info@eprice.gr <ma...@eprice.gr> > wrote:

c)  Run through the unit tests: mvn clean test [ Passed: 370 milliseconds]

 

?!

 

Was that seconds?  Or really milliseconds?


Re: MAHOUT 0.9 Release - New URL

Posted by Ted Dunning <te...@gmail.com>.
On Thu, Jan 16, 2014 at 7:35 AM, Sotiris Salloumis <in...@eprice.gr> wrote:

> c)  Run through the unit tests: mvn clean test [ Passed: 370 milliseconds]
>


?!

Was that seconds?  Or really milliseconds?

Re: MAHOUT 0.9 Release - New URL

Posted by Sergey Svinarchuk <ss...@hortonworks.com>.
I tested mahout 0.9 on Ubuntu 12.04 64bit, java version "1.6.0_27"

a) Verify that u can unpack the release (tar or zip) - passed
b) Verify u r able to compile the distro - passed
c)  Run through the unit tests: mvn clean test -passed
d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
through all the different options in each script. - will update later


On Thu, Jan 16, 2014 at 5:35 PM, Sotiris Salloumis <in...@eprice.gr> wrote:

> Hi Suneel,
>
> Below first round of tests,
>
> Environment: SMP Debian 3.2.51-1 x86_64
> Machine: Intel(R) Core(TM) i7 CPU         950  @ 3.07GHz stepping 05 12GB
> RAM
> OpenJDK: javac 1.6.0_27
>
> a) Verify that u can unpack the release (tar or zip)  [ Passed: tar -zxvf ]
> b) Verify u r able to compile the distro  [ Passed: With OpenJDK, Latest
> Maven on LatestDebian ]
> c)  Run through the unit tests: mvn clean test [ Passed: 370 milliseconds]
>
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> through all the different options in each script. [Ongoing will update
> later]
>
> Regards
> Sotiris
>
> -----Original Message-----
> From: Suneel Marthi [mailto:suneel_marthi@yahoo.com]
> Sent: Thursday, January 16, 2014 4:41 PM
> To: user@mahout.apache.org; mahout
> Subject: MAHOUT 0.9 Release - New URL
>
> Third time's a Charm!!!
>
>
> Here's the new URL for Mahout 0.9 Release:
>
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/
> apache/mahout/mahout-distribution/0.9/
>
> For those volunteering to test this, some of the things to be verified:
>
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c)  Run through the unit tests: mvn clean test
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> through all the different options in each script.
>
>
> Committers
>  and PMC members:
> ---------------------------------------
>
> Need 'at least 3 +1 votes' for the Release to pass.
>
>
> Thanks and Regards.
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: MAHOUT 0.9 Release - New URL

Posted by Ted Dunning <te...@gmail.com>.
On Thu, Jan 16, 2014 at 7:35 AM, Sotiris Salloumis <in...@eprice.gr> wrote:

> c)  Run through the unit tests: mvn clean test [ Passed: 370 milliseconds]
>


?!

Was that seconds?  Or really milliseconds?

RE: MAHOUT 0.9 Release - New URL

Posted by Sotiris Salloumis <in...@eprice.gr>.
Hi Suneel, 

Below first round of tests, 

Environment: SMP Debian 3.2.51-1 x86_64
Machine: Intel(R) Core(TM) i7 CPU         950  @ 3.07GHz stepping 05 12GB
RAM
OpenJDK: javac 1.6.0_27

a) Verify that u can unpack the release (tar or zip)  [ Passed: tar -zxvf ]
b) Verify u r able to compile the distro  [ Passed: With OpenJDK, Latest
Maven on LatestDebian ]
c)  Run through the unit tests: mvn clean test [ Passed: 370 milliseconds]

d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
through all the different options in each script. [Ongoing will update
later]

Regards
Sotiris

-----Original Message-----
From: Suneel Marthi [mailto:suneel_marthi@yahoo.com] 
Sent: Thursday, January 16, 2014 4:41 PM
To: user@mahout.apache.org; mahout
Subject: MAHOUT 0.9 Release - New URL 

Third time's a Charm!!!


Here's the new URL for Mahout 0.9 Release:
https://repository.apache.org/content/repositories/orgapachemahout-1002/org/
apache/mahout/mahout-distribution/0.9/

For those volunteering to test this, some of the things to be verified:

a) Verify that u can unpack the release (tar or zip)
b) Verify u r able to compile the distro
c)  Run through the unit tests: mvn clean test
d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
through all the different options in each script.
     

Committers
 and PMC members:
---------------------------------------

Need 'at least 3 +1 votes' for the Release to pass. 


Thanks and Regards.


Re: MAHOUT 0.9 Release - New URL

Posted by Suneel Marthi <su...@yahoo.com>.
Hmmm... that's an issue. Since both Dirichlet and Meanshift clustering have been removed from 0.9, cluster-syntheticcontrol.sh options 4,5 are not gonna work and should have been removed for 0.9.

To PMC,

 -> rollback the release, fix this issue (and other patches that were submitted in the last few days) and put out another release ?







On Monday, January 20, 2014 12:33 AM, Andrew Palumbo <ap...@outlook.com> wrote:
 
I ran through the tests with on a CentOS VM AMD64 2 cores 4 GB RAM.  Had a bit of trouble getting the Hadoop natives to compile and therefore may have run into some problems because of the hadoop setup.  Ran into some problems in the example scripts.  Particularly with ./cluster-syntheticcontrol.sh ->4,5.  I will run through the rest of the examples when im sure I've got hadoop setup right.


Apache Maven 3.1.2-SNAPSHOT 
Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
Java home: /usr/java/jdk1.6.0_45/jre
OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64", family: "unix"
$MAHOUT_LOCAL=true
Hadoop 2.2.0


a) Verify that u can unpack the release (tar or zip) ...passed (tar) [passed ]

b) Verify u r able to compile the distro

    mvn compile- [passed with warnings]

    [WARNING]  Expected all dependencies to require Scala version: 2.9.3
    [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires scala version: 2.9.3
    [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala version: 2.9.2
    [WARNING] Multiple versions of scala libraries detected!

c)  Run through the unit tests: mvn clean test
    mvn clean test [passed]

d) Run the example scripts under $MAHOUT_HOME/examples/bin. 
Please run through all the different options in each script

    Running example scripts with $MAHOUT_LOCAL=true

    ./cluster-syntheticcontrol.sh ->1 [works]
    ./cluster-syntheticcontrol.sh ->2 [works]
    ./cluster-syntheticcontrol.sh ->3 [works]


    ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
    [...]
    WARNING: Unable to add class: org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
    java.lang.ClassNotFoundException: org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:171)
        at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn


    ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]

    WARNING: Unable to add class: org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
    java.lang.ClassNotFoundException: org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:171)
        at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
    WARNING: No org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found on classpath, will use command-line arguments only
    Unknown program 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.


    ./classify-20newsgroups.sh ->1 [works]
    ./classify-20newsgroups.sh ->2 [works]


    cluster-reuters.sh ->1 [works]
    cluster-reuters.sh ->2 [works]
    cluster-reuters.sh ->3 [works]
    
    Same error as noted previosly in the thread:

    cluster-reuters.sh ->4 [0 clusters]

    [...]

    WARNING: No qualcluster.props found on classpath, will use command-line arguments only
    Num clusters: 0; maxDistance: 0.000000
    [Dunn Index] First: Infinity
    [Davies-Bouldin Index] First: NaN
    Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
    INFO: Program took 669 ms (Minutes: 0.01115)
    cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train






> Date: Thu, 16 Jan 2014 06:41:09 -0800
> From: suneel_marthi@yahoo.com
> Subject: MAHOUT 0.9 Release - New URL 
> To: user@mahout.apache.org; dev@mahout.apache.org
> 
> Third time's a Charm!!!
> 
> 
> Here's the new URL for Mahout 0.9 Release:
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> 
> For those volunteering to test this, some of the things to be verified:
> 
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c)  Run through the unit tests: mvn clean test
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
>      
> 
> Committers
>  and PMC members:
> ---------------------------------------
> 
> Need 'at least 3 +1 votes' for the Release to pass. 
> 
> 
> Thanks and Regards.

RE: MAHOUT 0.9 Release - New URL

Posted by Andrew Palumbo <ap...@outlook.com>.
I checked out the latest code.  Compiled and tested on a CentOS 6 AMD64 VM without incident.

$MAHOUT_LOCAL=true

classify-20newsgroups.sh->1
classify-20newsgroups.sh->2
classify-20newsgroups.sh->3

cluster-reuters.sh->1
cluster-reuters.sh->2
cluster-reuters.sh->3
cluster-reuters.sh->4 [0 clusters]

cluster-syntheticcontrol.sh->1
cluster-syntheticcontrol.sh->2
cluster-syntheticcontrol.sh->3

factorize-movielens-1M.sh /home/andy/test_data/ml-1m/ratings.dat

All ran without incident.

I dont have the netflix dataset so can't test that.




> Date: Tue, 21 Jan 2014 13:47:27 -0800
> Subject: Re: MAHOUT 0.9 Release - New URL
> From: andrew.musselman@gmail.com
> To: dev@mahout.apache.org
> 
> *classify-20newsgroups.sh*
> 
> *Complementary naive bayes:*
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances          :      11207       98.9406%
> Incorrectly Classified Instances        :        120        1.0594%
> Total Classified Instances              :      11327
> 
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       b       c       d       e       f       g       h       i       j
>     k       l       m       n       o       p       q       r       s
>  t        <--Classified as
> 475     0       0       1       0       0       0       0       0       0
>     0       0       0       0       1       0       1       0       0
>  0         |  478         a     = alt.atheism
> 0       597     1       1       0       1       1       0       0       0
>     0       1       0       2       1       0       0       0       0
>  0         |  605         b     = comp.graphics
> 0       1       620     3       0       1       0       0       0       0
>     0       1       0       0       1       0       0       0       0
>  0         |  627         c     = comp.os.ms-windows.misc
> 1       1       1       593     2       0       0       0       0       0
>     0       0       0       0       0       1       0       0       0
>  0         |  599         d     = comp.sys.ibm.pc.hardware
> 0       1       1       0       568     0       1       0       0       0
>     1       1       2       0       0       0       0       1       0
>  0         |  576         e     = comp.sys.mac.hardware
> 0       4       2       0       0       581     0       0       0       0
>     0       0       0       0       0       0       0       0       0
>  0         |  587         f     = comp.windows.x
> 0       0       0       1       2       0       571     3       0       0
>     1       1       4       1       0       0       0       0       0
>  0         |  584         g     = misc.forsale
> 0       0       0       1       0       0       0       589     1       0
>     0       1       1       0       0       0       0       0       0
>  0         |  593         h     = rec.autos
> 0       0       0       0       0       0       0       1       565     0
>     0       0       0       0       1       0       0       0       0
>  0         |  567         i     = rec.motorcycles
> 0       0       0       0       0       0       0       0       0       600
>     2       0       0       0       1       0       0       0       0
>  0         |  603         j     = rec.sport.baseball
> 0       0       0       0       0       0       0       0       0       1
>     584     0       0       0       0       0       0       0       0
>  0         |  585         k     = rec.sport.hockey
> 0       0       0       0       0       0       0       0       0       0
>     0       579     0       0       0       0       0       1       0
>  0         |  580         l     = sci.crypt
> 0       0       0       1       3       0       2       0       0       2
>     0       0       567     1       2       1       0       0       0
>  0         |  579         m     = sci.electronics
> 0       0       0       0       0       0       0       0       0       0
>     0       0       1       605     0       0       0       0       0
>  0         |  606         n     = sci.med
> 0       0       0       0       0       0       0       0       0       0
>     0       0       0       0       602     0       0       0       0
>  0         |  602         o     = sci.space
> 0       0       0       0       0       0       0       0       0       0
>     0       0       0       1       0       602     0       0       1
>  0         |  604         p     = soc.religion.christian
> 0       0       0       0       0       0       0       0       0       0
>     0       0       0       0       0       0       556     0       0
>  0         |  556         q     = talk.politics.mideast
> 0       0       1       0       0       0       0       0       0       0
>     0       1       0       0       1       0       0       568     0
>  0         |  571         r     = talk.politics.guns
> 11      0       0       0       0       0       0       0       0       1
>     0       0       0       1       3       8       1       4       338
>  2         |  369         s     = talk.religion.misc
> 0       0       0       0       0       0       0       0       0       0
>     1       0       0       0       1       0       3       4       0
>  447       |  456         t     = talk.politics.misc
> 
> =======================================================
> Statistics
> -------------------------------------------------------
> Kappa                                       0.9806
> Accuracy                                   98.9406%
> Reliability                                94.0932%
> Reliability (standard deviation)            0.2163
> 
> Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 15870 ms (Minutes: 0.2645)
> + echo 'Testing on holdout set'
> Testing on holdout set
> + ./bin/mahout testnb -i /tmp/mahout-work-ec2-user/20news-test-vectors -m
> /tmp/mahout-work-ec2-user/model -l /tmp/mahout-work-ec2-user/labelindex -ow
> -o /tmp/mahout-work-ec2-user/20news-testing -c
> 
> [snip]
> 
> INFO: Complementary Results:
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances          :       6715       89.3071%
> Incorrectly Classified Instances        :        804       10.6929%
> Total Classified Instances              :       7519
> 
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       b       c       d       e       f       g       h       i       j
>     k       l       m       n       o       p       q       r       s
>  t        <--Classified as
> 298     0       0       0       0       0       0       0       0       1
>     0       0       0       1       2       5       1       0       13
> 0         |  321         a     = alt.atheism
> 0       298     11      6       1       12      2       2       1       1
>     3       8       3       4       2       4       1       4       4
>  1         |  368         b     = comp.graphics
> 1       17      286     16      4       9       6       3       2       0
>     1       0       1       7       1       0       2       1       0
>  1         |  358         c     = comp.os.ms-windows.misc
> 2       6       11      309     9       5       14      8       1       0
>     2       0       6       4       2       0       1       2       1
>  0         |  383         d     = comp.sys.ibm.pc.hardware
> 0       10      8       7       334     7       5       5       2       0
>     3       0       2       1       1       0       1       1       0
>  0         |  387         e     = comp.sys.mac.hardware
> 1       13      7       8       2       355     2       0       2       0
>     0       5       1       1       3       0       0       1       0
>  0         |  401         f     = comp.windows.x
> 0       7       11      29      12      9       268     16      8       4
>     3       2       6       4       2       1       3       1       2
>  3         |  391         g     = misc.forsale
> 0       1       0       0       3       0       7       362     8       2
>     2       1       2       0       2       0       1       2       0
>  4         |  397         h     = rec.autos
> 0       0       0       1       0       0       1       0       423     0
>     0       0       2       1       0       1       0       0       0
>  0         |  429         i     = rec.motorcycles
> 0       0       1       0       0       0       0       2       2       371
>     8       0       2       3       0       2       0       0       0
>  0         |  391         j     = rec.sport.baseball
> 0       0       1       0       0       0       1       0       0       2
>     409     0       0       0       0       0       0       0       0
>  1         |  414         k     = rec.sport.hockey
> 0       0       1       2       1       0       1       0       0       0
>     0       404     0       0       0       0       0       1       0
>  1         |  411         l     = sci.crypt
> 0       5       4       11      1       3       7       9       2       5
>     3       3       339     2       6       0       1       1       2
>  1         |  405         m     = sci.electronics
> 0       4       0       1       0       0       0       1       0       1
>     1       0       3       367     3       1       2       0       0
>  0         |  384         n     = sci.med
> 0       1       2       0       1       0       2       0       0       1
>     0       0       1       1       375     0       1       0       0
>  0         |  385         o     = sci.space
> 4       2       1       1       0       0       1       1       2       0
>     0       1       1       5       1       367     4       0       1
>  1         |  393         p     = soc.religion.christian
> 0       1       0       0       0       0       0       0       0       2
>     0       0       0       0       0       2       378     0       1
>  0         |  384         q     = talk.politics.mideast
> 0       0       0       0       0       2       1       1       1       1
>     0       3       0       3       0       0       2       319     2
>  4         |  339         r     = talk.politics.guns
> 32      0       0       1       0       0       0       0       0       1
>     1       1       0       2       2       26      5       7       175
>  6         |  259         s     = talk.religion.misc
> 0       0       0       2       0       0       0       0       0       1
>     2       2       0       1       2       1       10      18      2
>  278       |  319         t     = talk.politics.misc
> 
> =======================================================
> Statistics
> -------------------------------------------------------
> Kappa                                       0.8594
> Accuracy                                   89.3071%
> Reliability                                 84.611%
> Reliability (standard deviation)            0.2148
> 
> Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 10840 ms (Minutes: 0.18066666666666667)
> 
> 
> *Naive bayes:*
> INFO: Standard NB Results:
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances          :      11286       99.0869%
> Incorrectly Classified Instances        :        104        0.9131%
> Total Classified Instances              :      11390
> 
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       b       c       d       e       f       g       h       i       j
>     k       l       m       n       o       p       q       r       s
>  t        <--Classified as
> 474     0       0       0       0       0       0       0       0       0
>     0       0       0       0       0       0       0       0       2
>  1         |  477         a     = alt.atheism
> 0       566     0       2       0       1       0       0       0       0
>     0       0       0       0       0       0       0       0       0
>  0         |  569         b     = comp.graphics
> 0       10      590     29      2       4       1       0       0       0
>     0       0       1       0       0       0       0       0       0
>  1         |  638         c     = comp.os.ms-windows.misc
> 0       0       0       596     0       0       0       0       0       0
>     0       0       0       0       0       0       0       0       0
>  0         |  596         d     = comp.sys.ibm.pc.hardware
> 0       0       0       0       575     0       1       0       0       0
>     0       0       1       0       0       0       0       0       0
>  0         |  577         e     = comp.sys.mac.hardware
> 0       2       2       2       0       593     1       0       0       0
>     0       0       0       0       1       0       0       0       0
>  0         |  601         f     = comp.windows.x
> 0       0       0       1       0       0       589     1       0       0
>     1       0       2       0       0       0       0       0       0
>  0         |  594         g     = misc.forsale
> 0       0       0       0       0       0       0       594     0       0
>     0       0       0       0       0       0       0       0       0
>  0         |  594         h     = rec.autos
> 0       0       0       0       0       0       0       0       611     0
>     0       0       0       0       0       0       0       0       0
>  0         |  611         i     = rec.motorcycles
> 0       0       0       0       0       0       0       0       0       616
>     1       0       0       0       0       0       0       0       0
>  0         |  617         j     = rec.sport.baseball
> 0       0       0       0       0       0       1       0       0       0
>     620     0       0       0       0       0       0       0       0
>  0         |  621         k     = rec.sport.hockey
> 0       0       0       0       0       0       0       0       0       0
>     0       580     0       0       0       0       0       1       0
>  0         |  581         l     = sci.crypt
> 0       0       0       3       1       0       0       0       0       0
>     0       0       571     0       0       0       0       0       0
>  0         |  575         m     = sci.electronics
> 0       0       0       0       0       0       0       0       0       0
>     0       0       2       583     0       0       0       0       0
>  0         |  585         n     = sci.med
> 0       0       0       0       0       0       0       0       0       0
>     0       0       0       1       599     0       0       0       0
>  0         |  600         o     = sci.space
> 0       1       0       0       0       0       0       0       0       0
>     0       0       0       0       0       615     0       0       0
>  0         |  616         p     = soc.religion.christian
> 1       0       0       0       0       0       0       0       0       0
>     0       0       0       0       0       1       560     0       0
>  0         |  562         q     = talk.politics.mideast
> 0       0       1       0       0       0       0       0       0       0
>     0       1       0       0       0       0       0       548     0
>  1         |  551         r     = talk.politics.guns
> 10      0       0       0       0       0       0       0       0       0
>     0       0       0       0       1       1       0       2       344
>  1         |  359         s     = talk.religion.misc
> 0       0       0       0       0       0       0       0       0       0
>     0       1       1       0       0       0       0       2       0
>  462       |  466         t     = talk.politics.misc
> 
> =======================================================
> Statistics
> -------------------------------------------------------
> Kappa                                       0.9847
> Accuracy                                   99.0869%
> Reliability                                94.3334%
> Reliability (standard deviation)            0.2169
> 
> Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 14304 ms (Minutes: 0.2384)
> + echo 'Testing on holdout set'
> Testing on holdout set
> 
> [snip]
> 
> INFO: Standard NB Results:
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances          :       6718       90.1019%
> Incorrectly Classified Instances        :        738        9.8981%
> Total Classified Instances              :       7456
> 
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       b       c       d       e       f       g       h       i       j
>     k       l       m       n       o       p       q       r       s
>  t        <--Classified as
> 294     0       0       0       0       0       0       0       0       0
>     0       2       0       1       1       6       1       1       16
> 0         |  322         a     = alt.atheism
> 0       345     6       14      6       11      6       0       0       0
>     0       5       7       1       3       0       0       0       0
>  0         |  404         b     = comp.graphics
> 2       29      177     78      22      19      9       1       0       0
>     0       4       2       0       1       1       0       0       1
>  1         |  347         c     = comp.os.ms-windows.misc
> 1       9       2       335     18      2       10      0       0       0
>     1       0       8       0       0       0       0       0       0
>  0         |  386         d     = comp.sys.ibm.pc.hardware
> 1       4       2       13      347     3       5       1       0       0
>     1       0       7       1       0       0       0       1       0
>  0         |  386         e     = comp.sys.mac.hardware
> 0       20      0       4       0       352     4       0       0       0
>     0       0       1       1       3       0       1       0       1
>  0         |  387         f     = comp.windows.x
> 0       2       0       21      5       1       323     7       2       2
>     0       2       12      0       3       0       0       0       0
>  1         |  381         g     = misc.forsale
> 0       1       0       0       1       0       15      363     8       1
>     0       0       4       1       0       0       0       1       0
>  1         |  396         h     = rec.autos
> 0       1       0       0       0       0       6       6       370     0
>     0       0       0       1       0       0       0       0       1
>  0         |  385         i     = rec.motorcycles
> 1       0       0       1       1       0       2       1       2       362
>     5       0       2       0       0       0       0       0       0
>  0         |  377         j     = rec.sport.baseball
> 0       0       0       1       2       0       0       0       0       3
>     371     0       0       0       0       0       0       0       0
>  1         |  378         k     = rec.sport.hockey
> 0       3       1       0       1       0       2       0       0       0
>     0       396     0       1       0       0       1       1       1
>  3         |  410         l     = sci.crypt
> 0       7       0       7       7       2       6       4       0       0
>     0       1       369     2       2       0       0       0       0
>  2         |  409         m     = sci.electronics
> 0       3       0       2       1       0       2       0       0       0
>     0       1       4       383     4       0       0       1       0
>  4         |  405         n     = sci.med
> 0       5       0       0       1       0       3       0       0       0
>     0       0       1       0       374     1       0       0       1
>  1         |  387         o     = sci.space
> 6       2       0       1       1       0       0       1       0       1
>     0       0       1       5       0       352     2       1       7
>  1         |  381         p     = soc.religion.christian
> 1       1       0       0       0       0       0       0       0       0
>     1       0       0       0       0       0       373     1       0
>  1         |  378         q     = talk.politics.mideast
> 0       0       0       0       0       0       1       0       1       0
>     0       2       0       0       0       0       0       346     2
>  7         |  359         r     = talk.politics.guns
> 26      1       0       1       0       0       0       2       0       1
>     1       0       0       1       1       20      2       6       200
>  7         |  269         s     = talk.religion.misc
> 1       0       0       0       0       0       0       2       0       0
>     1       0       0       2       2       0       1       14      0
>  286       |  309         t     = talk.politics.misc
> 
> =======================================================
> Statistics
> -------------------------------------------------------
> Kappa                                       0.8726
> Accuracy                                   90.1019%
> Reliability                                85.4491%
> Reliability (standard deviation)            0.2222
> 
> Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 10878 ms (Minutes: 0.1813)
> 
> *SGD:*
> 7532 test files
> 
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances          :       5649            75%
> Incorrectly Classified Instances        :       1883            25%
> Total Classified Instances              :       7532
> 
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       b       c       d       e       f       g       h       i       j
>     k       l       m       n       o       p       q       r       s
>  t        <--Classified as
> 186     6       3       10      5       0       33      4       13      15
>      7       1       24      15      3       15      5       5       29
> 15        |  394         a     = sci.space
> 5       309     0       3       2       5       0       0       0       1
>     9       21      2       0       0       18      4       4       1
>  1         |  385         b     = comp.sys.mac.hardware
> 4       1       101     3       0       1       63      0       7       0
>     1       1       5       16      3       0       3       7       1
>  34        |  251         c     = talk.religion.misc
> 11      12      1       265     1       10      3       0       0       17
>      10      11      5       2       0       11      3       6       21
> 0         |  389         d     = comp.graphics
> 2       1       1       0       349     2       3       0       3       2
>     6       1       5       1       0       2       15      2       1
>  2         |  398         e     = rec.motorcycles
> 7       20      3       19      2       254     6       0       2       11
>      2       39      7       2       0       4       2       2       9
>  3         |  394         f     = comp.os.ms-windows.misc
> 2       1       13      0       0       0       247     0       1       1
>     3       0       6       2       4       0       2       3       5
>  29        |  319         g     = alt.atheism
> 1       1       0       0       2       0       2       361     0       1
>     2       0       2       0       0       1       3       22      0
>  1         |  399         h     = rec.sport.hockey
> 3       0       3       1       0       0       5       0       161     0
>     1       2       12      102     0       0       1       2       11
> 6         |  310         i     = talk.politics.misc
> 2       8       0       19      0       19      0       0       1       294
>     10      11      4       2       0       5       0       3       11
> 6         |  395         j     = comp.windows.x
> 2       10      0       1       1       0       0       0       0       1
>     347     13      2       1       0       5       3       2       2
>  0         |  390         k     = misc.forsale
> 1       36      0       6       1       25      0       0       1       6
>     10      257     2       1       0       34      6       0       6
>  0         |  392         l     = comp.sys.ibm.pc.hardware
> 2       2       2       2       1       0       12      0       0       6
>     10      4       312     5       2       13      11      3       3
>  6         |  396         m     = sci.med
> 2       0       3       2       1       0       0       1       13      0
>     5       1       2       314     2       0       2       2       10
> 4         |  364         n     = talk.politics.guns
> 1       0       2       1       1       0       34      1       33      1
>     3       0       1       8       271     1       4       5       6
>  3         |  376         o     = talk.politics.mideast
> 3       14      0       8       2       8       3       1       1       7
>     12      29      6       2       1       245     13      2       32
> 4         |  393         p     = sci.electronics
> 3       3       0       2       11      0       1       0       2       1
>     11      6       4       2       0       11      330     4       4
>  1         |  396         q     = rec.autos
> 0       0       1       0       1       0       4       12      3       1
>     3       0       0       0       0       5       6       359     1
>  1         |  397         r     = rec.sport.baseball
> 0       1       0       0       0       1       0       0       3       3
>     0       0       3       2       1       6       1       6       366
>  3         |  396         s     = sci.crypt
> 0       2       11      1       1       0       40      0       1       2
>     3       4       2       1       0       5       0       2       2
>  321       |  398         t     = soc.religion.christian
> 
> =======================================================
> Statistics
> -------------------------------------------------------
> Kappa                                       0.7073
> Accuracy                                        75%
> Reliability                                70.6238%
> Reliability (standard deviation)            0.2187
> Log-likelihood                mean      :    -1.1182
>                               25%-ile   :    -1.6911
>                               75%-ile   :    -0.0803
> 
> Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 10783 ms (Minutes: 0.17971666666666666)
> 
> 
> 
> 
> On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <su...@yahoo.com>wrote:
> 
> > Thanks Andrew for reporting that. I rolled back the release to fix this
> > and few other issues.
> >
> > We have removed asf-examples*.sh from trunk as the sample file at the url
> > mentioned in ur email is not available.
> > This is something we need to fix and restore in 1.0.
> >
> >
> >
> >
> >
> >
> >
> > On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <ap...@outlook.com>
> > wrote:
> >
> > from the asf-email-examples.sh script:
> >
> > # You will need to download or otherwise obtain some or all of the Amazon
> > ASF Em
> > ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566) to
> > use this
> > script.
> > # To obtain a full copy you will need to launch an EC2 instance and mount
> > the da
> > taset to download it, otherwise you can get a sample of it at
> > #
> > http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
> >
> > It looks like the:
> > http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
> >
> > link is down.
> >
> > Is there somewhere else that we can get a subset of the ASF emails?
> >
> >
> >
> > Date: Tue, 21 Jan 2014 09:48:06 -0800
> > > Subject: Re: MAHOUT 0.9 Release - New URL
> > > From: andrew.musselman@gmail.com
> > > To: dev@mahout.apache.org
> > >
> > > Sure thing; continuing to smoke test the other examples tonight
> > >
> > >
> > > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <suneel_marthi@yahoo.com
> > >wrote:
> > >
> > > > Thanks Andrew M., see that some of the example scripts need to be
> > fixed as
> > > > they still refer to the deprecated algorithms.
> > > > See that the Streaming KMeans has failed for you as well.
> > > >
> > > > I'll be rolling back the release today to fix these issues.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
> > > > andrew.musselman@gmail.com> wrote:
> > > >
> > > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default
> > 64-bit
> > > > Linux AMI from tarball.
> > > >
> > > > All tests pass.
> > > >
> > > > *Output of examples:*
> > > > *asf-email-examples.sh, run on mahout.apache.org
> > > > <http://mahout.apache.org>:*
> > > > *recommendations:*
> > > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
> > > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000  | less
> > > > 1
> > > >
> > > >
> > [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
> > > > 4
> > > >
> > > >
> > [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
> > > > 6
> > > >
> > > >
> > [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
> > > > 8
> > > >     [12758:1.0,19409:1.0,11112:1.0]
> > > > 11
> > > >
> > > >
> > [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
> > > > 14
> > > >
> > > >
> > [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
> > > > 15
> > > >
> > > >
> > [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
> > > > 16
> > > >
> > > >
> > [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
> > > > 18
> > > >
> > > >
> > [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
> > > > 19      [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
> > > > 20
> > > >
> > > >
> > [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
> > > > [snip]
> > > >
> > > > *clustering; kmeans:*
> > > > [snip]
> > > >         Weight : [props - optional]:  Point:
> > > >         1.0 :
> > > >  [distance-squared=1.0193102046188427]:
> > > > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus =
> > > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
> > 7573:0.204,
> > > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093, 9779:0.159,
> > > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
> > > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
> > > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
> > > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
> > > > 39789:0.110, 40743:0.190, 45775:0.086]
> > > >         1.0 : [distance-squared=0.9823018320457279]:
> > > > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus =
> > > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
> > 5336:0.106,
> > > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173, 7832:0.072,
> > > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
> > > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
> > > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
> > > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
> > > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
> > > >         1.0 : [distance-squared=0.9509142993214911]:
> > > > /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor=
> > > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
> > > >  4419:0.076,
> > > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056, 7235:0.048,
> > > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123, 7683:0.077,
> > > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
> > 10225:0.081,
> > > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
> > > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
> > > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
> > > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
> > > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
> > > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
> > > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
> > > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
> > > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
> > > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
> > > > 41280:0.065, 41696:0.072, 41947:0.118,
> > > >  43685:0.086, 44077:0.308,
> > > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
> > > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
> > > > [snip]
> > > >
> > > > *clustering; dirichlet:*
> > > > Get this complaint:
> > > > Running Dirichlet with K = 8
> > > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > > > HADOOP_CONF_DIR=
> > > > MAHOUT-JOB:
> > > >
> > > >
> > /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class:
> > dirichlet
> > > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props found on
> > > > classpath, will use command-line arguments only
> > > > Unknown program 'dirichlet' chosen.
> > > >
> > > > *clustering: minhash:*
> > > > Running Minhash
> > > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > > > HADOOP_CONF_DIR=
> > > > MAHOUT-JOB:
> > > >
> > > >
> > /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > > > 14/01/21 05:17:27 WARN
> > > >  driver.MahoutDriver: Unable to add class: minhash
> > > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found on
> > > > classpath, will use command-line arguments only
> > > > Unknown program 'minhash' chosen.
> > > >
> > > > *classification; standard:*
> > > > =======================================================
> > > > Summary
> > > > -------------------------------------------------------
> > > > Correctly Classified Instances          :       5384       87.7874%
> > > > Incorrectly Classified Instances        :        749       12.2126%
> > > > Total Classified Instances              :       6133
> > > >
> > > > =======================================================
> > > > Confusion Matrix
> > > > -------------------------------------------------------
> > > > a       b       c       d
> > > >     <--Classified as
> > > > 2949    7       531     25       |  3512        a     = dev
> > > > 0       0       0       0        |  0           b     = general
> > > > 99      8       1763    8        |  1878        c     = user
> > > > 41      1       29      672      |  743         d     = commits
> > > >
> > > > =======================================================
> > > > Statistics
> > > > -------------------------------------------------------
> > > > Kappa
> > > >  0.7877
> > > > Accuracy                                   87.7874%
> > > > Reliability                                 53.658%
> > > > Reliability (standard deviation)            0.4911
> > > >
> > > > *classification; complementary:*
> > > > =======================================================
> > > > Summary
> > > > -------------------------------------------------------
> > > > Correctly Classified Instances          :       5530       90.1679%
> > > > Incorrectly Classified Instances        :        603        9.8321%
> > > > Total Classified Instances              :
> > > >  6133
> > > >
> > > > =======================================================
> > > > Confusion Matrix
> > > > -------------------------------------------------------
> > > > a       b       c       d       <--Classified as
> > > > 3168    0       276     68       |  3512        a     = dev
> > > > 0       0       0       0        |  0           b     = general
> > > > 196     0       1652    30       |  1878        c     = user
> > > > 25      0       8       710      |  743         d     =
> > > >  commits
> > > >
> > > > =======================================================
> > > > Statistics
> > > > -------------------------------------------------------
> > > > Kappa                                       0.8259
> > > > Accuracy                                   90.1679%
> > > > Reliability                                54.7459%
> > > > Reliability (standard deviation)            0.5005
> > > >
> > > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms
> > (Minutes:
> > > > 0.34836666666666666)
> > > >
> > > > *classification; sgd, with three categories:*
> > > > Running SGD Training
> > > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
> > > >  and
> > > > HADOOP_CONF_DIR=
> > > > MAHOUT-JOB:
> > > >
> > > >
> > /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
> > > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
> > classpath,
> > > > will use command-line arguments only
> > > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
> > > > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
> > > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
> > > > --output=[asf-output/classification/sgd/models], --poolSize=[5],
> > > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
> > > > 24168 training files
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       1
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
> > > >  2
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       3
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       4
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       6
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       8
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       10
> > > > 0.000   0.00    none
> > > > 0.00    0.00
> > > >    0.00    0.00    0.0000000       0.0000000       12
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       15
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       20
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       25
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       30
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000
> > > >     0.0000000       40
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       50
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       60
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       70
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       80
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       100
> > > > 0.000
> > > >  0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       120
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       140
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       150
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       200
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       250
> > > > 0.000   0.00    none
> > > > 0.00    0.00
> > > >  0.00    0.00    0.0000000       0.0000000       300
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       400
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       500
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       600
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       700
> > > > 0.000   0.00    none
> > > > 0.00    0.00    0.00    0.00    0.0000000
> > > >  0.0000000       800
> > > > 0.000   0.00    none
> > > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > > > 1.0019413e-08   1000    -0.607  75.78   none
> > > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > > > 1.0019413e-08   1200    -0.607  75.78   none
> > > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > > > 1.0019413e-08   1400    -0.607  75.78   none
> > > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > > > 1.0019413e-08   1500    -0.607  75.78   none
> > > > 0.24    43686.00        17924.00        329.50
> > > >  1.0571799e-08
> > > > 1.0032261e-08   2000    -0.487  82.65   none
> > > > 0.24    49753.00        21610.00        330.71  1.3770070e-08
> > > > 1.0011902e-08   2500    -0.439  83.90   none
> > > > 0.24    49753.00        21610.00        330.71  1.3770070e-08
> > > > 1.0011902e-08   3000    -0.439  83.90   none
> > > > 0.32    50635.00        28531.00        437.09  1.0551175e-08
> > > > 1.0000001e-08   4000    -0.351  88.14   none
> > > > 0.32    50635.00        32642.00        437.09  1.0551175e-08
> > > > 1.0000000e-08   5000    -0.378  87.10   none
> > > > 0.32    50635.00        36461.00        437.09
> > > >  1.0556652e-08
> > > > 1.0000001e-08   6000    -0.372  86.89   none
> > > > 0.32    50635.00        37768.00        437.09  1.0576742e-08
> > > > 1.0000001e-08   7000    -0.334  89.26   none
> > > > 0.32    50635.00        38807.00        437.09  1.0576742e-08
> > > > 1.0000000e-08   8000    -0.368  87.52   none
> > > > 0.32    50635.00        44731.00        437.09  1.0576716e-08
> > > > 1.0000000e-08   10000   -0.374  87.39   none
> > > > 0.32    50635.00        45672.00        437.09  1.0576716e-08
> > > > 1.0000000e-08   12000   -0.298  88.26   none
> > > > Exception in thread "main" java.lang.IllegalStateException:
> > > > java.lang.ArrayIndexOutOfBoundsException:
> > > >  2
> > > >         at
> > > >
> > > >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
> > > >         at
> > > >
> > > >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
> > > >         at
> > > >
> > > >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
> > > >         at
> > > >
> > org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
> > > >         at
> > > >
> > org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
> > > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > >         at
> > > >
> > > >
> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > > >
> > > >  at
> > > >
> > > >
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > >         at java.lang.reflect.Method.invoke(Method.java:622)
> > > >         at
> > > >
> > > >
> > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > > >         at
> > > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > > >         at
> > > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> > > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > >         at
> > > >
> > > >
> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > > >         at
> > > >
> > > >
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > >         at java.lang.reflect.Method.invoke(Method.java:622)
> > > >         at
> > > >  org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> > > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
> > > >         at
> > > > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
> > > >         at
> > > >
> > > >
> > org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
> > > >         at
> > > >
> > > >
> > org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
> > > >         at
> > > >
> > > >
> > org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
> > > >         at
> > > >
> > > >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
> > > >         at
> > > >
> > > >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
> > > >
> > > >  at
> > > >
> > > >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
> > > >         at
> > > >
> > > >
> > org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
> > > >         at
> > > >
> > > >
> > org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
> > > >         at
> > > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > > >         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > >         at
> > > >
> > > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > > >         at
> > > >
> > > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > >         at java.lang.Thread.run(Thread.java:701)
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
> > > > andrew.musselman@gmail.com> wrote:
> > > >
> > > > > Trying out the build today
> > > > >
> > > > >
> > > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
> > suneel_marthi@yahoo.com
> > > > >wrote:
> > > > >
> > > > >> This is an issue (trivial one though) that needs to be fixed for 0.9
> > > > >> Release, will be rerolling the release today (in the next few hrs)
> > and
> > > > >> putting out a new release candidate in staging.
> > > > >>
> > > > >> Thanks for reporting this Andrew P.
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
> > > > ap.dev@outlook.com>
> > > > >> wrote:
> > > > >>
> > > > >> I ran through the tests with on a CentOS VM
> > > >  AMD64 2 cores 4 GB RAM.  Had
> > > > >> a bit of trouble getting the Hadoop natives to compile and
> > therefore may
> > > > >> have run into some problems because of the hadoop setup.  Ran into
> > some
> > > > >> problems in the example scripts.  Particularly with
> > > > >> ./cluster-syntheticcontrol.sh ->4,5.  I will run through the rest
> > of the
> > > > >> examples when im sure I've got hadoop setup right.
> > > > >>
> > > > >>
> > > > >> Apache Maven 3.1.2-SNAPSHOT
> > > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> > > > >> Java home: /usr/java/jdk1.6.0_45/jre
> > > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch:
> > "amd64",
> > > > >> family: "unix"
> > > > >> $MAHOUT_LOCAL=true
> > > > >> Hadoop 2.2.0
> > > > >>
> > > > >>
> > > > >> a) Verify that u can unpack the release (tar or zip) ...passed (tar)
> > > > >> [passed ]
> > > > >>
> > > > >> b) Verify u r able to compile the
> > > >  distro
> > > > >>
> > > > >>     mvn compile- [passed with warnings]
> > > > >>
> > > > >>     [WARNING]  Expected all dependencies to require Scala version:
> > 2.9.3
> > > > >>     [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires
> > scala
> > > > >> version: 2.9.3
> > > > >>     [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala
> > > > >> version: 2.9.2
> > > > >>     [WARNING] Multiple versions of scala libraries detected!
> > > > >>
> > > > >> c)  Run through the unit tests: mvn clean test
> > > > >>     mvn clean test [passed]
> > > > >>
> > > > >> d) Run the
> > > > >>  example scripts under $MAHOUT_HOME/examples/bin.
> > > > >> Please run through all the different options in each script
> > > > >>
> > > > >>     Running example scripts with $MAHOUT_LOCAL=true
> > > > >>
> > > > >>
> > > >  ./cluster-syntheticcontrol.sh ->1 [works]
> > > > >>     ./cluster-syntheticcontrol.sh ->2 [works]
> > > > >>     ./cluster-syntheticcontrol.sh ->3 [works]
> > > > >>
> > > > >>
> > > > >>     ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
> > > > >>     [...]
> > > > >>     WARNING: Unable to add class:
> > > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > > > >>     java.lang.ClassNotFoundException:
> > > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > > > >>         at
> > > > >>  java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > > > >>         at java.security.AccessController.doPrivileged(Native
> > Method)
> > > > >>         at
> > java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > > > >>         at
> > > >  java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > > > >>         at
> > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > > > >>         at java.lang.Class.forName0(Native Method)
> > > > >>         at java.lang.Class.forName(Class.java:171)
> > > > >>         at
> > > > >>
> > org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > > > >>         at
> > > > >>  org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > > > >>     Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
> > > > >>
> > > > >>
> > > > >>     ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
> > > > >>
> > > > >>     WARNING: Unable to add class:
> > > > >>
> > > >  org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > > > >>     java.lang.ClassNotFoundException:
> > > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > > > >>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > > > >>         at java.security.AccessController.doPrivileged(Native
> > Method)
> > > > >>         at
> > java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > > > >>         at
> > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > > > >>         at java.lang.Class.forName0(Native Method)
> > > > >>         at
> > > >  java.lang.Class.forName(Class.java:171)
> > > > >>         at
> > > > >>
> > org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > > > >>         at
> > > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > > > >>     Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
> > > > >>     WARNING: No
> > > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props
> > found
> > > > on
> > > > >> classpath, will use command-line arguments only
> > > > >>     Unknown program
> > > > >>  'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
> > chosen.
> > > > >>
> > > > >>
> > > > >>     ./classify-20newsgroups.sh ->1 [works]
> > > > >>     ./classify-20newsgroups.sh ->2 [works]
> > > > >>
> > > > >>
> > > > >>     cluster-reuters.sh ->1 [works]
> > > > >>
> > > >  cluster-reuters.sh ->2 [works]
> > > > >>     cluster-reuters.sh ->3 [works]
> > > > >>
> > > > >>     Same error as noted previosly in the thread:
> > > > >>
> > > > >>     cluster-reuters.sh ->4 [0 clusters]
> > > > >>
> > > > >>     [...]
> > > > >>
> > > > >>     WARNING: No qualcluster.props found on classpath, will use
> > > > >> command-line arguments only
> > > > >>     Num clusters: 0; maxDistance: 0.000000
> > > > >>     [Dunn Index]
> > > > >>  First: Infinity
> > > > >>     [Davies-Bouldin Index] First: NaN
> > > > >>     Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
> > > > >>     INFO: Program took 669 ms (Minutes: 0.01115)
> > > > >>     cluster,distance.mean,distance.sd
> > > > >>
> > > >
> > > >
> > ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
> > > > >> > From: suneel_marthi@yahoo.com
> > > > >> > Subject: MAHOUT 0.9 Release - New URL
> > > > >> > To: user@mahout.apache.org; dev@mahout.apache.org
> > > > >> >
> > > > >> > Third time's a Charm!!!
> > > > >> >
> > > > >> >
> > > > >> > Here's the new URL for Mahout 0.9 Release:
> > > > >> >
> > > > >>
> > > >
> > https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> > > > >> >
> > > > >> > For those volunteering to test this, some of the things to be
> > > > verified:
> > > > >> >
> > > > >> > a) Verify that u can unpack the release (tar or zip)
> > > > >> > b) Verify u r able to compile the distro
> > > > >> > c)  Run through the unit tests: mvn clean test
> > > > >> > d) Run the example scripts
> > > > >>  under $MAHOUT_HOME/examples/bin. Please run through all the
> > different
> > > > >> options in each script.
> > > > >> >
> > > > >> >
> > > > >> > Committers
> > > > >> >  and PMC members:
> > > > >> > ---------------------------------------
> > > > >> >
> > > > >> > Need 'at least 3 +1 votes' for the Release to pass.
> > > > >> >
> > > > >> >
> > > > >> > Thanks and
> > > >  Regards.
> > > > >>
> > > > >
> > > > >
> > > >
> >
 		 	   		  

Re: MAHOUT 0.9 Release - New URL

Posted by Suneel Marthi <su...@yahoo.com>.
Rolled back trunk to 0.9-SNAPSHOT, please go ahead and commit any changes.




On Saturday, January 25, 2014 4:19 AM, Suneel Marthi <su...@yahoo.com> wrote:
 
I'll be rolling back the 0.9 Release today that's presently in staging in light of the issues that have been reported in the last 2 days and need to be fixed as part of the Release.

Please hold off from committing any new code to trunk meanwhile.

Thanks.






On Friday, January 24, 2014 7:36 PM, Ted Dunning <te...@gmail.com> wrote:



My schedule has opened up a bit and I can review as well.





On Fri, Jan 24, 2014 at 3:06 PM, Sebastian Schelter <ss...@googlemail.com> wrote:

I will try the next candidate agaim, so one vote is sure.
>Am 24.01.2014 23:54 schrieb "Suneel Marthi" <su...@yahoo.com>:
>
>
>> I am open to having the conversation (and a part of me feels that the
>> clusteringId fix should be in 0.9).
>>
>> If we decide to incorporate that into 0.9, I need to rollback the 0.9
>> Release that's presently out there in staging (for the 5th time in a row
>> now).
>> I am fine with doing that.
>>
>> What do you think we should do?
>>
>> a) Go ahead with 0.9 release without the fix for M-1410 .
>> b) Rollback 0.9 and include the fix for M-1410
>> c) Go ahead with 0.9, have an interim 1.0 Release Candidate that includes
>> M-1410 and any other issues/enhancements that are fixed.
>>
>>
>> I am leaning towards (b), my only concern being that from my experience in
>> the past few weeks; its become real hard to muster the minimum 3 +1 PMC
>> votes required for a release to pass.
>>
>>
>>
>>
>>
>>
>>
>>
>> On Friday, January 24, 2014 5:45 PM, Ted Dunning <te...@gmail.com>
>> wrote:
>>
>>
>>
>> Can we hold a separate discussion about whether the clustering id issue
>> has to be in 0.9 while extending the vote deadline if necessary?
>>
>> If not, then all these votes are great and the release can go forward.
>>
>> If it is the sense that that fix has to be in, we should leave time for
>> people for people to reverse their votes to -1.
>>
>>
>>
>>
>> On Fri, Jan 24, 2014 at 2:22 PM, Suneel Marthi <su...@yahoo.com>
>> wrote:
>>
>> Thanks for all those that volunteered.  The voting for 0.9 Release closes
>> tomorrow.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >On Friday, January 24, 2014 4:05 AM, Gokhan Capan <gk...@gmail.com>
>> wrote:
>> >
>> >Using CentOS 6.5 and hadoop 1.2.1, all passed.
>> >
>> >+1 from me
>> >
>> >Gokhan
>> >
>> >
>> >
>> >On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo <ap...@outlook.com>
>> wrote:
>> >
>> >> a),b),c),d) all passed on CentOS for me
>> >>
>> >> > Date: Thu, 23 Jan 2014 13:43:06 +0200
>> >> > Subject: Re: MAHOUT 0.9 Release - New URL
>> >> > From: ssvinarchuk@hortonworks.com
>> >> > To: dev@mahout.apache.org
>> >> >
>> >> > I did a), b), c), d) and all steps pass.
>> >> > +1
>> >> >
>> >> >
>> >> > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll <gsingers@apache.org
>> >> >wrote:
>> >> >
>> >> > > +1 from me.
>> >> > >
>> >> > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi <suneel_marthi@yahoo.com
>> >
>> >> > > wrote:
>> >> > >
>> >> > > > Fixed the issues that were reported this week and restored FP
>> mining
>> >> > > into the codebase.
>> >> > > >
>> >> > > > Here's the URL for the final release in staging:-
>> >> > > >
>> >> > >
>> >>
>> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
>> >> > > >
>> >> > > > The artifacts have been signed with the
>> > following key:
>> >> > > > https://people.apache.org/keys/committer/smarthi.asc
>> >> > > >
>> >> > > >
>> >> > > > a) Verify that u can unpack the release (tar or zip)
>> >> > > > b) Verify u r able to compile the distro
>> >> > > > c)  Run through the unit tests: mvn clean test
>> >> > > > d) Run the example scripts under
>> > $MAHOUT_HOME/examples/bin. Please
>> >> run
>> >> > > through all the different options in each script.
>> >> > > >
>> >> > > > Committers and PMC, need a minimum of 3 '+1' votes for the release
>> >> to be
>> >> > > finalized.
>> >> > >
>> >> > > --------------------------------------------
>> >> > > Grant Ingersoll | @gsingers
>> >> > > http://www.lucidworks.com
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> >
>> >> > --
>> >>
>> > > CONFIDENTIALITY NOTICE
>> >> > NOTICE: This message is intended for the use of the individual or
>> entity
>> >> to
>> >> > which it is addressed and may contain information that is
>> confidential,
>> >> > privileged and exempt from disclosure under applicable law. If the
>> reader
>> >> > of this message is not the intended recipient, you are hereby notified
>> >> that
>> >> > any printing, copying, dissemination, distribution, disclosure or
>> >> > forwarding of this communication is strictly prohibited. If you have
>> >> > received this communication in error, please contact the sender
>> >> immediately
>> >> > and delete it from your system. Thank You.
>> >>
>> >>
>

Re: MAHOUT 0.9 Release - New URL

Posted by Suneel Marthi <su...@yahoo.com>.
I'll be rolling back the 0.9 Release today that's presently in staging in light of the issues that have been reported in the last 2 days and need to be fixed as part of the Release.

Please hold off from committing any new code to trunk meanwhile.

Thanks.





On Friday, January 24, 2014 7:36 PM, Ted Dunning <te...@gmail.com> wrote:
 


My schedule has opened up a bit and I can review as well.





On Fri, Jan 24, 2014 at 3:06 PM, Sebastian Schelter <ss...@googlemail.com> wrote:

I will try the next candidate agaim, so one vote is sure.
>Am 24.01.2014 23:54 schrieb "Suneel Marthi" <su...@yahoo.com>:
>
>
>> I am open to having the conversation (and a part of me feels that the
>> clusteringId fix should be in 0.9).
>>
>> If we decide to incorporate that into 0.9, I need to rollback the 0.9
>> Release that's presently out there in staging (for the 5th time in a row
>> now).
>> I am fine with doing that.
>>
>> What do you think we should do?
>>
>> a) Go ahead with 0.9 release without the fix for M-1410 .
>> b) Rollback 0.9 and include the fix for M-1410
>> c) Go ahead with 0.9, have an interim 1.0 Release Candidate that includes
>> M-1410 and any other issues/enhancements that are fixed.
>>
>>
>> I am leaning towards (b), my only concern being that from my experience in
>> the past few weeks; its become real hard to muster the minimum 3 +1 PMC
>> votes required for a release to pass.
>>
>>
>>
>>
>>
>>
>>
>>
>> On Friday, January 24, 2014 5:45 PM, Ted Dunning <te...@gmail.com>
>> wrote:
>>
>>
>>
>> Can we hold a separate discussion about whether the clustering id issue
>> has to be in 0.9 while extending the vote deadline if necessary?
>>
>> If not, then all these votes are great and the release can go forward.
>>
>> If it is the sense that that fix has to be in, we should leave time for
>> people for people to reverse their votes to -1.
>>
>>
>>
>>
>> On Fri, Jan 24, 2014 at 2:22 PM, Suneel Marthi <su...@yahoo.com>
>> wrote:
>>
>> Thanks for all those that volunteered.  The voting for 0.9 Release closes
>> tomorrow.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >On Friday, January 24, 2014 4:05 AM, Gokhan Capan <gk...@gmail.com>
>> wrote:
>> >
>> >Using CentOS 6.5 and hadoop 1.2.1, all passed.
>> >
>> >+1 from me
>> >
>> >Gokhan
>> >
>> >
>> >
>> >On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo <ap...@outlook.com>
>> wrote:
>> >
>> >> a),b),c),d) all passed on CentOS for me
>> >>
>> >> > Date: Thu, 23 Jan 2014 13:43:06 +0200
>> >> > Subject: Re: MAHOUT 0.9 Release - New URL
>> >> > From: ssvinarchuk@hortonworks.com
>> >> > To: dev@mahout.apache.org
>> >> >
>> >> > I did a), b), c), d) and all steps pass.
>> >> > +1
>> >> >
>> >> >
>> >> > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll <gsingers@apache.org
>> >> >wrote:
>> >> >
>> >> > > +1 from me.
>> >> > >
>> >> > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi <suneel_marthi@yahoo.com
>> >
>> >> > > wrote:
>> >> > >
>> >> > > > Fixed the issues that were reported this week and restored FP
>> mining
>> >> > > into the codebase.
>> >> > > >
>> >> > > > Here's the URL for the final release in staging:-
>> >> > > >
>> >> > >
>> >>
>> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
>> >> > > >
>> >> > > > The artifacts have been signed with the
>> > following key:
>> >> > > > https://people.apache.org/keys/committer/smarthi.asc
>> >> > > >
>> >> > > >
>> >> > > > a) Verify that u can unpack the release (tar or zip)
>> >> > > > b) Verify u r able to compile the distro
>> >> > > > c)  Run through the unit tests: mvn clean test
>> >> > > > d) Run the example scripts under
>> > $MAHOUT_HOME/examples/bin. Please
>> >> run
>> >> > > through all the different options in each script.
>> >> > > >
>> >> > > > Committers and PMC, need a minimum of 3 '+1' votes for the release
>> >> to be
>> >> > > finalized.
>> >> > >
>> >> > > --------------------------------------------
>> >> > > Grant Ingersoll | @gsingers
>> >> > > http://www.lucidworks.com
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> >
>> >> > --
>> >>
>> > > CONFIDENTIALITY NOTICE
>> >> > NOTICE: This message is intended for the use of the individual or
>> entity
>> >> to
>> >> > which it is addressed and may contain information that is
>> confidential,
>> >> > privileged and exempt from disclosure under applicable law. If the
>> reader
>> >> > of this message is not the intended recipient, you are hereby notified
>> >> that
>> >> > any printing, copying, dissemination, distribution, disclosure or
>> >> > forwarding of this communication is strictly prohibited. If you have
>> >> > received this communication in error, please contact the sender
>> >> immediately
>> >> > and delete it from your system. Thank You.
>> >>
>> >>
>

Re: MAHOUT 0.9 Release - New URL

Posted by Ted Dunning <te...@gmail.com>.
My schedule has opened up a bit and I can review as well.




On Fri, Jan 24, 2014 at 3:06 PM, Sebastian Schelter <ssc.open@googlemail.com
> wrote:

> I will try the next candidate agaim, so one vote is sure.
> Am 24.01.2014 23:54 schrieb "Suneel Marthi" <su...@yahoo.com>:
>
> > I am open to having the conversation (and a part of me feels that the
> > clusteringId fix should be in 0.9).
> >
> > If we decide to incorporate that into 0.9, I need to rollback the 0.9
> > Release that's presently out there in staging (for the 5th time in a row
> > now).
> > I am fine with doing that.
> >
> > What do you think we should do?
> >
> > a) Go ahead with 0.9 release without the fix for M-1410 .
> > b) Rollback 0.9 and include the fix for M-1410
> > c) Go ahead with 0.9, have an interim 1.0 Release Candidate that includes
> > M-1410 and any other issues/enhancements that are fixed.
> >
> >
> > I am leaning towards (b), my only concern being that from my experience
> in
> > the past few weeks; its become real hard to muster the minimum 3 +1 PMC
> > votes required for a release to pass.
> >
> >
> >
> >
> >
> >
> >
> >
> > On Friday, January 24, 2014 5:45 PM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> >
> >
> > Can we hold a separate discussion about whether the clustering id issue
> > has to be in 0.9 while extending the vote deadline if necessary?
> >
> > If not, then all these votes are great and the release can go forward.
> >
> > If it is the sense that that fix has to be in, we should leave time for
> > people for people to reverse their votes to -1.
> >
> >
> >
> >
> > On Fri, Jan 24, 2014 at 2:22 PM, Suneel Marthi <su...@yahoo.com>
> > wrote:
> >
> > Thanks for all those that volunteered.  The voting for 0.9 Release closes
> > tomorrow.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >On Friday, January 24, 2014 4:05 AM, Gokhan Capan <gk...@gmail.com>
> > wrote:
> > >
> > >Using CentOS 6.5 and hadoop 1.2.1, all passed.
> > >
> > >+1 from me
> > >
> > >Gokhan
> > >
> > >
> > >
> > >On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo <ap...@outlook.com>
> > wrote:
> > >
> > >> a),b),c),d) all passed on CentOS for me
> > >>
> > >> > Date: Thu, 23 Jan 2014 13:43:06 +0200
> > >> > Subject: Re: MAHOUT 0.9 Release - New URL
> > >> > From: ssvinarchuk@hortonworks.com
> > >> > To: dev@mahout.apache.org
> > >> >
> > >> > I did a), b), c), d) and all steps pass.
> > >> > +1
> > >> >
> > >> >
> > >> > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll <
> gsingers@apache.org
> > >> >wrote:
> > >> >
> > >> > > +1 from me.
> > >> > >
> > >> > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi <
> suneel_marthi@yahoo.com
> > >
> > >> > > wrote:
> > >> > >
> > >> > > > Fixed the issues that were reported this week and restored FP
> > mining
> > >> > > into the codebase.
> > >> > > >
> > >> > > > Here's the URL for the final release in staging:-
> > >> > > >
> > >> > >
> > >>
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> > >> > > >
> > >> > > > The artifacts have been signed with the
> > > following key:
> > >> > > > https://people.apache.org/keys/committer/smarthi.asc
> > >> > > >
> > >> > > >
> > >> > > > a) Verify that u can unpack the release (tar or zip)
> > >> > > > b) Verify u r able to compile the distro
> > >> > > > c)  Run through the unit tests: mvn clean test
> > >> > > > d) Run the example scripts under
> > > $MAHOUT_HOME/examples/bin. Please
> > >> run
> > >> > > through all the different options in each script.
> > >> > > >
> > >> > > > Committers and PMC, need a minimum of 3 '+1' votes for the
> release
> > >> to be
> > >> > > finalized.
> > >> > >
> > >> > > --------------------------------------------
> > >> > > Grant Ingersoll | @gsingers
> > >> > > http://www.lucidworks.com
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> >
> > >> > --
> > >>
> > > > CONFIDENTIALITY NOTICE
> > >> > NOTICE: This message is intended for the use of the individual or
> > entity
> > >> to
> > >> > which it is addressed and may contain information that is
> > confidential,
> > >> > privileged and exempt from disclosure under applicable law. If the
> > reader
> > >> > of this message is not the intended recipient, you are hereby
> notified
> > >> that
> > >> > any printing, copying, dissemination, distribution, disclosure or
> > >> > forwarding of this communication is strictly prohibited. If you have
> > >> > received this communication in error, please contact the sender
> > >> immediately
> > >> > and delete it from your system. Thank You.
> > >>
> > >>
>

Re: MAHOUT 0.9 Release - New URL

Posted by Sebastian Schelter <ss...@googlemail.com>.
I will try the next candidate agaim, so one vote is sure.
Am 24.01.2014 23:54 schrieb "Suneel Marthi" <su...@yahoo.com>:

> I am open to having the conversation (and a part of me feels that the
> clusteringId fix should be in 0.9).
>
> If we decide to incorporate that into 0.9, I need to rollback the 0.9
> Release that's presently out there in staging (for the 5th time in a row
> now).
> I am fine with doing that.
>
> What do you think we should do?
>
> a) Go ahead with 0.9 release without the fix for M-1410 .
> b) Rollback 0.9 and include the fix for M-1410
> c) Go ahead with 0.9, have an interim 1.0 Release Candidate that includes
> M-1410 and any other issues/enhancements that are fixed.
>
>
> I am leaning towards (b), my only concern being that from my experience in
> the past few weeks; its become real hard to muster the minimum 3 +1 PMC
> votes required for a release to pass.
>
>
>
>
>
>
>
>
> On Friday, January 24, 2014 5:45 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
>
>
> Can we hold a separate discussion about whether the clustering id issue
> has to be in 0.9 while extending the vote deadline if necessary?
>
> If not, then all these votes are great and the release can go forward.
>
> If it is the sense that that fix has to be in, we should leave time for
> people for people to reverse their votes to -1.
>
>
>
>
> On Fri, Jan 24, 2014 at 2:22 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> Thanks for all those that volunteered.  The voting for 0.9 Release closes
> tomorrow.
> >
> >
> >
> >
> >
> >
> >
> >
> >On Friday, January 24, 2014 4:05 AM, Gokhan Capan <gk...@gmail.com>
> wrote:
> >
> >Using CentOS 6.5 and hadoop 1.2.1, all passed.
> >
> >+1 from me
> >
> >Gokhan
> >
> >
> >
> >On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo <ap...@outlook.com>
> wrote:
> >
> >> a),b),c),d) all passed on CentOS for me
> >>
> >> > Date: Thu, 23 Jan 2014 13:43:06 +0200
> >> > Subject: Re: MAHOUT 0.9 Release - New URL
> >> > From: ssvinarchuk@hortonworks.com
> >> > To: dev@mahout.apache.org
> >> >
> >> > I did a), b), c), d) and all steps pass.
> >> > +1
> >> >
> >> >
> >> > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll <gsingers@apache.org
> >> >wrote:
> >> >
> >> > > +1 from me.
> >> > >
> >> > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi <suneel_marthi@yahoo.com
> >
> >> > > wrote:
> >> > >
> >> > > > Fixed the issues that were reported this week and restored FP
> mining
> >> > > into the codebase.
> >> > > >
> >> > > > Here's the URL for the final release in staging:-
> >> > > >
> >> > >
> >>
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> >> > > >
> >> > > > The artifacts have been signed with the
> > following key:
> >> > > > https://people.apache.org/keys/committer/smarthi.asc
> >> > > >
> >> > > >
> >> > > > a) Verify that u can unpack the release (tar or zip)
> >> > > > b) Verify u r able to compile the distro
> >> > > > c)  Run through the unit tests: mvn clean test
> >> > > > d) Run the example scripts under
> > $MAHOUT_HOME/examples/bin. Please
> >> run
> >> > > through all the different options in each script.
> >> > > >
> >> > > > Committers and PMC, need a minimum of 3 '+1' votes for the release
> >> to be
> >> > > finalized.
> >> > >
> >> > > --------------------------------------------
> >> > > Grant Ingersoll | @gsingers
> >> > > http://www.lucidworks.com
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> >
> >> > --
> >>
> > > CONFIDENTIALITY NOTICE
> >> > NOTICE: This message is intended for the use of the individual or
> entity
> >> to
> >> > which it is addressed and may contain information that is
> confidential,
> >> > privileged and exempt from disclosure under applicable law. If the
> reader
> >> > of this message is not the intended recipient, you are hereby notified
> >> that
> >> > any printing, copying, dissemination, distribution, disclosure or
> >> > forwarding of this communication is strictly prohibited. If you have
> >> > received this communication in error, please contact the sender
> >> immediately
> >> > and delete it from your system. Thank You.
> >>
> >>

Re: MAHOUT 0.9 Release - New URL

Posted by Suneel Marthi <su...@yahoo.com>.
I am open to having the conversation (and a part of me feels that the clusteringId fix should be in 0.9). 

If we decide to incorporate that into 0.9, I need to rollback the 0.9 Release that's presently out there in staging (for the 5th time in a row now). 
I am fine with doing that.  

What do you think we should do?

a) Go ahead with 0.9 release without the fix for M-1410 .
b) Rollback 0.9 and include the fix for M-1410
c) Go ahead with 0.9, have an interim 1.0 Release Candidate that includes M-1410 and any other issues/enhancements that are fixed.


I am leaning towards (b), my only concern being that from my experience in the past few weeks; its become real hard to muster the minimum 3 +1 PMC votes required for a release to pass. 








On Friday, January 24, 2014 5:45 PM, Ted Dunning <te...@gmail.com> wrote:
 


Can we hold a separate discussion about whether the clustering id issue has to be in 0.9 while extending the vote deadline if necessary?

If not, then all these votes are great and the release can go forward.

If it is the sense that that fix has to be in, we should leave time for people for people to reverse their votes to -1.




On Fri, Jan 24, 2014 at 2:22 PM, Suneel Marthi <su...@yahoo.com> wrote:

Thanks for all those that volunteered.  The voting for 0.9 Release closes tomorrow.
>
>
>
>
>
>
>
>
>On Friday, January 24, 2014 4:05 AM, Gokhan Capan <gk...@gmail.com> wrote:
>
>Using CentOS 6.5 and hadoop 1.2.1, all passed.
>
>+1 from me
>
>Gokhan
>
>
>
>On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo <ap...@outlook.com> wrote:
>
>> a),b),c),d) all passed on CentOS for me
>>
>> > Date: Thu, 23 Jan 2014 13:43:06 +0200
>> > Subject: Re: MAHOUT 0.9 Release - New URL
>> > From: ssvinarchuk@hortonworks.com
>> > To: dev@mahout.apache.org
>> >
>> > I did a), b), c), d) and all steps pass.
>> > +1
>> >
>> >
>> > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll <gsingers@apache.org
>> >wrote:
>> >
>> > > +1 from me.
>> > >
>> > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi <su...@yahoo.com>
>> > > wrote:
>> > >
>> > > > Fixed the issues that were reported this week and restored FP mining
>> > > into the codebase.
>> > > >
>> > > > Here's the URL for the final release in staging:-
>> > > >
>> > >
>> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
>> > > >
>> > > > The artifacts have been signed with the
> following key:
>> > > > https://people.apache.org/keys/committer/smarthi.asc
>> > > >
>> > > >
>> > > > a) Verify that u can unpack the release (tar or zip)
>> > > > b) Verify u r able to compile the distro
>> > > > c)  Run through the unit tests: mvn clean test
>> > > > d) Run the example scripts under
> $MAHOUT_HOME/examples/bin. Please
>> run
>> > > through all the different options in each script.
>> > > >
>> > > > Committers and PMC, need a minimum of 3 '+1' votes for the release
>> to be
>> > > finalized.
>> > >
>> > > --------------------------------------------
>> > > Grant Ingersoll | @gsingers
>> > > http://www.lucidworks.com
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> >
>> > --
>>
> > CONFIDENTIALITY NOTICE
>> > NOTICE: This message is intended for the use of the individual or entity
>> to
>> > which it is addressed and may contain information that is confidential,
>> > privileged and exempt from disclosure under applicable law. If the reader
>> > of this message is not the intended recipient, you are hereby notified
>> that
>> > any printing, copying, dissemination, distribution, disclosure or
>> > forwarding of this communication is strictly prohibited. If you have
>> > received this communication in error, please contact the sender
>> immediately
>> > and delete it from your system. Thank You.
>>
>>

Re: MAHOUT 0.9 Release - New URL

Posted by Ted Dunning <te...@gmail.com>.
Can we hold a separate discussion about whether the clustering id issue has
to be in 0.9 while extending the vote deadline if necessary?

If not, then all these votes are great and the release can go forward.

If it is the sense that that fix has to be in, we should leave time for
people for people to reverse their votes to -1.



On Fri, Jan 24, 2014 at 2:22 PM, Suneel Marthi <su...@yahoo.com>wrote:

> Thanks for all those that volunteered.  The voting for 0.9 Release closes
> tomorrow.
>
>
>
>
>
>
>
> On Friday, January 24, 2014 4:05 AM, Gokhan Capan <gk...@gmail.com>
> wrote:
>
> Using CentOS 6.5 and hadoop 1.2.1, all passed.
>
> +1 from me
>
> Gokhan
>
>
>
> On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo <ap...@outlook.com>
> wrote:
>
> > a),b),c),d) all passed on CentOS for me
> >
> > > Date: Thu, 23 Jan 2014 13:43:06 +0200
> > > Subject: Re: MAHOUT 0.9 Release - New URL
> > > From: ssvinarchuk@hortonworks.com
> > > To: dev@mahout.apache.org
> > >
> > > I did a), b), c), d) and all steps pass.
> > > +1
> > >
> > >
> > > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll <gsingers@apache.org
> > >wrote:
> > >
> > > > +1 from me.
> > > >
> > > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi <su...@yahoo.com>
> > > > wrote:
> > > >
> > > > > Fixed the issues that were reported this week and restored FP
> mining
> > > > into the codebase.
> > > > >
> > > > > Here's the URL for the final release in staging:-
> > > > >
> > > >
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> > > > >
> > > > > The artifacts have been signed with the
>  following key:
> > > > > https://people.apache.org/keys/committer/smarthi.asc
> > > > >
> > > > >
> > > > > a) Verify that u can unpack the release (tar or zip)
> > > > > b) Verify u r able to compile the distro
> > > > > c)  Run through the unit tests: mvn clean test
> > > > > d) Run the example scripts under
>  $MAHOUT_HOME/examples/bin. Please
> > run
> > > > through all the different options in each script.
> > > > >
> > > > > Committers and PMC, need a minimum of 3 '+1' votes for the release
> > to be
> > > > finalized.
> > > >
> > > > --------------------------------------------
> > > > Grant Ingersoll | @gsingers
> > > > http://www.lucidworks.com
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > > --
> >
>  > CONFIDENTIALITY NOTICE
> > > NOTICE: This message is intended for the use of the individual or
> entity
> > to
> > > which it is addressed and may contain information that is confidential,
> > > privileged and exempt from disclosure under applicable law. If the
> reader
> > > of this message is not the intended recipient, you are hereby notified
> > that
> > > any printing, copying, dissemination, distribution, disclosure or
> > > forwarding of this communication is strictly prohibited. If you have
> > > received this communication in error, please contact the sender
> > immediately
> > > and delete it from your system. Thank You.
> >
> >
>

Re: MAHOUT 0.9 Release - New URL

Posted by Suneel Marthi <su...@yahoo.com>.
Thanks for all those that volunteered.  The voting for 0.9 Release closes tomorrow.







On Friday, January 24, 2014 4:05 AM, Gokhan Capan <gk...@gmail.com> wrote:
 
Using CentOS 6.5 and hadoop 1.2.1, all passed.

+1 from me

Gokhan



On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo <ap...@outlook.com> wrote:

> a),b),c),d) all passed on CentOS for me
>
> > Date: Thu, 23 Jan 2014 13:43:06 +0200
> > Subject: Re: MAHOUT 0.9 Release - New URL
> > From: ssvinarchuk@hortonworks.com
> > To: dev@mahout.apache.org
> >
> > I did a), b), c), d) and all steps pass.
> > +1
> >
> >
> > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll <gsingers@apache.org
> >wrote:
> >
> > > +1 from me.
> > >
> > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi <su...@yahoo.com>
> > > wrote:
> > >
> > > > Fixed the issues that were reported this week and restored FP mining
> > > into the codebase.
> > > >
> > > > Here's the URL for the final release in staging:-
> > > >
> > >
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> > > >
> > > > The artifacts have been signed with the
 following key:
> > > > https://people.apache.org/keys/committer/smarthi.asc
> > > >
> > > >
> > > > a) Verify that u can unpack the release (tar or zip)
> > > > b) Verify u r able to compile the distro
> > > > c)  Run through the unit tests: mvn clean test
> > > > d) Run the example scripts under
 $MAHOUT_HOME/examples/bin. Please
> run
> > > through all the different options in each script.
> > > >
> > > > Committers and PMC, need a minimum of 3 '+1' votes for the release
> to be
> > > finalized.
> > >
> > > --------------------------------------------
> > > Grant Ingersoll | @gsingers
> > > http://www.lucidworks.com
> > >
> > >
> > >
> > >
> > >
> > >
> >
> > --
>
 > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
>
>

Re: MAHOUT 0.9 Release - New URL

Posted by Gokhan Capan <gk...@gmail.com>.
Using CentOS 6.5 and hadoop 1.2.1, all passed.

+1 from me

Gokhan


On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo <ap...@outlook.com> wrote:

> a),b),c),d) all passed on CentOS for me
>
> > Date: Thu, 23 Jan 2014 13:43:06 +0200
> > Subject: Re: MAHOUT 0.9 Release - New URL
> > From: ssvinarchuk@hortonworks.com
> > To: dev@mahout.apache.org
> >
> > I did a), b), c), d) and all steps pass.
> > +1
> >
> >
> > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll <gsingers@apache.org
> >wrote:
> >
> > > +1 from me.
> > >
> > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi <su...@yahoo.com>
> > > wrote:
> > >
> > > > Fixed the issues that were reported this week and restored FP mining
> > > into the codebase.
> > > >
> > > > Here's the URL for the final release in staging:-
> > > >
> > >
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> > > >
> > > > The artifacts have been signed with the following key:
> > > > https://people.apache.org/keys/committer/smarthi.asc
> > > >
> > > >
> > > > a) Verify that u can unpack the release (tar or zip)
> > > > b) Verify u r able to compile the distro
> > > > c)  Run through the unit tests: mvn clean test
> > > > d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please
> run
> > > through all the different options in each script.
> > > >
> > > > Committers and PMC, need a minimum of 3 '+1' votes for the release
> to be
> > > finalized.
> > >
> > > --------------------------------------------
> > > Grant Ingersoll | @gsingers
> > > http://www.lucidworks.com
> > >
> > >
> > >
> > >
> > >
> > >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
>
>

RE: MAHOUT 0.9 Release - New URL

Posted by Andrew Palumbo <ap...@outlook.com>.
a),b),c),d) all passed on CentOS for me

> Date: Thu, 23 Jan 2014 13:43:06 +0200
> Subject: Re: MAHOUT 0.9 Release - New URL
> From: ssvinarchuk@hortonworks.com
> To: dev@mahout.apache.org
> 
> I did a), b), c), d) and all steps pass.
> +1
> 
> 
> On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll <gs...@apache.org>wrote:
> 
> > +1 from me.
> >
> > On Jan 22, 2014, at 5:55 PM, Suneel Marthi <su...@yahoo.com>
> > wrote:
> >
> > > Fixed the issues that were reported this week and restored FP mining
> > into the codebase.
> > >
> > > Here's the URL for the final release in staging:-
> > >
> > https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> > >
> > > The artifacts have been signed with the following key:
> > > https://people.apache.org/keys/committer/smarthi.asc
> > >
> > >
> > > a) Verify that u can unpack the release (tar or zip)
> > > b) Verify u r able to compile the distro
> > > c)  Run through the unit tests: mvn clean test
> > > d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> > through all the different options in each script.
> > >
> > > Committers and PMC, need a minimum of 3 '+1' votes for the release to be
> > finalized.
> >
> > --------------------------------------------
> > Grant Ingersoll | @gsingers
> > http://www.lucidworks.com
> >
> >
> >
> >
> >
> >
> 
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.
 		 	   		  

Re: MAHOUT 0.9 Release - New URL

Posted by Sergey Svinarchuk <ss...@hortonworks.com>.
I did a), b), c), d) and all steps pass.
+1


On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll <gs...@apache.org>wrote:

> +1 from me.
>
> On Jan 22, 2014, at 5:55 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> > Fixed the issues that were reported this week and restored FP mining
> into the codebase.
> >
> > Here's the URL for the final release in staging:-
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> >
> > The artifacts have been signed with the following key:
> > https://people.apache.org/keys/committer/smarthi.asc
> >
> >
> > a) Verify that u can unpack the release (tar or zip)
> > b) Verify u r able to compile the distro
> > c)  Run through the unit tests: mvn clean test
> > d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> through all the different options in each script.
> >
> > Committers and PMC, need a minimum of 3 '+1' votes for the release to be
> finalized.
>
> --------------------------------------------
> Grant Ingersoll | @gsingers
> http://www.lucidworks.com
>
>
>
>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: MAHOUT 0.9 Release - New URL

Posted by Grant Ingersoll <gs...@apache.org>.
+1 from me.

On Jan 22, 2014, at 5:55 PM, Suneel Marthi <su...@yahoo.com> wrote:

> Fixed the issues that were reported this week and restored FP mining into the codebase.
> 
> Here's the URL for the final release in staging:-
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> 
> The artifacts have been signed with the following key:
> https://people.apache.org/keys/committer/smarthi.asc
> 
> 
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c)  Run through the unit tests: mvn clean test
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
> 
> Committers and PMC, need a minimum of 3 '+1' votes for the release to be finalized.

--------------------------------------------
Grant Ingersoll | @gsingers
http://www.lucidworks.com






Re: MAHOUT 0.9 Release - New URL

Posted by Andrew Musselman <an...@gmail.com>.
Likewise, a) through d) work on an Amazon AMI and Ubuntu 12.04.

+1


On Wed, Jan 22, 2014 at 6:38 PM, Suneel Marthi <su...@yahoo.com>wrote:

> Same here. I did a), b), c) and d) too and all tests pass. Here's my +1,
> if my vote counts.
>
>
>
>
>
> On Wednesday, January 22, 2014 7:11 PM, Sebastian Schelter <ss...@apache.org>
> wrote:
>
> I did a) b) c) and d) without noting any problem so far. +1 from me.
>
> --sebastian
>
>
>
> On 01/22/2014 11:55 PM, Suneel Marthi wrote:
> > Fixed the issues that were reported this week and restored FP mining
> into the codebase.
> >
> > Here's the URL for the final release in staging:-
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> >
> > The artifacts have been signed with the following key:
> > https://people.apache.org/keys/committer/smarthi.asc
> >
> >
> > a) Verify that u can unpack the release (tar or zip)
> > b) Verify u r able to compile the distro
> > c)  Run through the unit tests: mvn clean test
> > d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> through all the different options in each script.
> >
> > Committers and PMC, need a minimum of 3 '+1' votes for the release to be
> finalized.
> >
>

Re: MAHOUT 0.9 Release - New URL

Posted by Suneel Marthi <su...@yahoo.com>.
Same here. I did a), b), c) and d) too and all tests pass. Here's my +1, if my vote counts.





On Wednesday, January 22, 2014 7:11 PM, Sebastian Schelter <ss...@apache.org> wrote:
 
I did a) b) c) and d) without noting any problem so far. +1 from me.

--sebastian



On 01/22/2014 11:55 PM, Suneel Marthi wrote:
> Fixed the issues that were reported this week and restored FP mining into the codebase.
>
> Here's the URL for the final release in staging:-
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
>
> The artifacts have been signed with the following key:
> https://people.apache.org/keys/committer/smarthi.asc
>
>
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c)  Run through the unit tests: mvn clean test
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
>
> Committers and PMC, need a minimum of 3 '+1' votes for the release to be finalized.
>

Re: MAHOUT 0.9 Release - New URL

Posted by Sebastian Schelter <ss...@apache.org>.
I did a) b) c) and d) without noting any problem so far. +1 from me.

--sebastian


On 01/22/2014 11:55 PM, Suneel Marthi wrote:
> Fixed the issues that were reported this week and restored FP mining into the codebase.
>
> Here's the URL for the final release in staging:-
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
>
> The artifacts have been signed with the following key:
> https://people.apache.org/keys/committer/smarthi.asc
>
>
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c)  Run through the unit tests: mvn clean test
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
>
> Committers and PMC, need a minimum of 3 '+1' votes for the release to be finalized.
>



Re: MAHOUT 0.9 Release - New URL

Posted by Suneel Marthi <su...@yahoo.com>.
Fixed the issues that were reported this week and restored FP mining into the codebase.

Here's the URL for the final release in staging:-
https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/

The artifacts have been signed with the following key:
https://people.apache.org/keys/committer/smarthi.asc


a) Verify that u can unpack the release (tar or zip)
b) Verify u r able to compile the distro
c)  Run through the unit tests: mvn clean test
d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.

Committers and PMC, need a minimum of 3 '+1' votes for the release to be finalized. 

Re: MAHOUT 0.9 Release - New URL

Posted by Suneel Marthi <su...@yahoo.com>.
Fixed the issues that were reported this week and restored FP mining into the codebase.

Here's the URL for the final release in staging:-
https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/

The artifacts have been signed with the following key:
https://people.apache.org/keys/committer/smarthi.asc


a) Verify that u can unpack the release (tar or zip)
b) Verify u r able to compile the distro
c)  Run through the unit tests: mvn clean test
d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.

Committers and PMC, need a minimum of 3 '+1' votes for the release to be finalized. 

Re: MAHOUT 0.9 Release - New URL

Posted by Frank Scholten <fr...@frankscholten.nl>.
Updated trunk and Streaming K-Means works in sequential mode:

Average distance in cluster 0 [45]: 15835.645959
Average distance in cluster 1 [2]: 12655.384293
Cluster 2 is has 1 data point. Need atleast 2 data points in a cluster for
OnlineSummarizer.
Average distance in cluster 3 [12]: 16639.304306
Average distance in cluster 4 [12466]: 1765.051250
Average distance in cluster 5 [613]: 7968.987864
Average distance in cluster 6 [453]: 11678.351990
Average distance in cluster 7 [7848]: 3475.257237
Average distance in cluster 8 [137]: 14040.611024
Cluster 9 is has 1 data point. Need atleast 2 data points in a cluster for
OnlineSummarizer.
Num clusters: 10; maxDistance: 111156.247816
[Dunn Index] First: 0.002786
[Davies-Bouldin Index] First: 53.915866
Jan 22, 2014 11:29:51 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 33654 ms (Minutes: 0.5609)
cluster,distance.mean,distance.sd
,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
0,15835.645959,4305.418183,-6094.066658,13494.967198,14908.353662,19013.936496,25229.795662,45,train
1,12655.384293,12655.384293,-12655.384293,0.000000,12655.384293,25310.768587,37966.152880,2,train
3,16639.304306,8137.858191,76.596986,23652.805218,17032.177174,21993.360359,26116.861135,12,train
4,1765.051250,1912.041786,53.833129,665.221968,1398.456928,2116.252442,91200.149803,12466,train
5,7968.987864,3283.509392,3106.173001,5444.653631,7154.854277,9475.107969,20961.123807,613,train
6,11678.351990,3986.046231,80.428556,8688.530291,10657.331417,13992.879084,25697.590999,453,train
7,3475.257237,2849.263422,244.613872,1701.937225,2645.839526,4362.384712,111156.247816,7848,train
8,14040.611024,3847.956007,-4400.223235,11295.103900,13063.847142,16227.853884,22973.712042,137,train



On Wed, Jan 22, 2014 at 10:45 PM, Suneel Marthi <su...@yahoo.com>wrote:

> Thanks Andrew. I'll put a Release out soon.
>
>
>
>
> On Wednesday, January 22, 2014 3:52 PM, Andrew Palumbo <ap...@outlook.com>
> wrote:
>
>
> Everything seems to run well on my local machine:
>
> Checked out revision 1560364.
>
> CentOS 6
> Apache Maven 3.1.2-SNAPSHOT
> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> Java home: /usr/java/jdk1.6.0_45/jre
> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64",
> family: "unix"
> Hadoop 2.2.0
>
>
> mvn clean compile -DSkipTests [OK-Several Warnings]
> mvn clean test [PASSED ALL]
> mvn clean install -DskipTests [OK]
>
>
> $MAHOUT_LOCAL=true
>
> classify-20newsgroups.sh->1 [Accuracy    89.3529%]
> classify-20newsgroups.sh->2 [Accuracy    90.8317%]
> classify-20newsgroups.sh->3 [Accuracy    76.2746%]
> classify-20newsgroups.sh->4 [cleans up]
>
> cluster-reuters.sh->1 [20 clusters]  -kmeans
> cluster-reuters.sh->2 [INFO: 20 clusters]  -fkmeans
> cluster-reuters.sh->3 [OK]  -lda
> cluster-reuters.sh->4 [10 (9) clusters- see attached]  -streaming kmeans
>
> ./cluster-syntheticcontrol.sh->1 [INFO: Wrote 6 clusters]
> ./cluster-syntheticcontrol.sh->2 [INFO: Wrote 6 clusters]
> ./cluster-syntheticcontrol.sh->3 [INFO: Wrote 6 clusters]
>
> ./factorize-movielens-1M.sh /home/andy/test_data/ml-1m/ratings.dat [RMSE
> is: 0.851264570339848]
>
>
>
>
> Attached is full output of cluster-reuters.sh->4 Streaming K-Means.
>
>
>
> From cluster-reuters.sh->4 Streaming K-Means:
>
> Cluster 0 is has 1 data point. Need atleast 2 data points in a cluster for
> OnlineSummarizer.
> Average distance in cluster 1 [2816]: 3438.913758
> Average distance in cluster 2 [112]: 20617.345993
> Average distance in cluster 3 [4]: 32504.085379
> Average distance in cluster 4 [435]: 18476.579935
> Average distance in cluster 5 [27]: 21153.167574
> Average distance in cluster 6 [15480]: 2040.864416
> Average distance in cluster 7 [1711]: 5281.742482
> Average distance in cluster 8 [964]: 15762.976239
> Average distance in cluster 9 [28]: 19762.109632
> Num clusters: 10; maxDistance: 107106.379648
>
>
>
>
> [Dunn Index] First: 0.002272
> [Davies-Bouldin Index] First: 57.871266
> Jan 22, 2014 12:14:47 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 77695 ms (Minutes: 1.2949166666666667)
> cluster,distance.mean,distance.sd
> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> 1,3438.913758
> ,2430.072640,250.635051,1793.254765,2908.356638,4444.702564,22173.892767,2816,train
>
> 2,20617.345993,3978.577827,-8306.835555,17787.685767,19584.319120,22864.637511,37305.829397,112,train
>
> 3,32504.085379,29250.558538,-12174.296092,12174.296092,36522.888276,26372.137172,107106.379648,4,train
>
> 4,18476.579935,3600.742072,-7212.729374,15841.995992,17431.838259,20066.610494,40205.090209,435,train
>
> 5,21153.167574,4963.661797,-8880.583978,19729.348269,21251.400944,24588.743549,27926.248558,27,train
>
> 6,2040.864416,2007.719699,53.622493,841.033934,1571.121917,2396.407672,18967.768820,15480,train
>
> 7,5281.742482,3083.071478,1933.759989,3216.929268,4074.689928,6371.577109,20292.193673,1711,train
>
> 8,15762.976239,3158.956443,65.031208,13511.867700,14744.029626,17287.006957,31483.809655,964,train
>
> 9,19762.109632,4355.120345,-8902.814641,18669.317253,20712.227220,21602.660490,27452.910312,28,train
>
>
>
>
>
> > From: ap.dev@outlook.com
> > To: dev@mahout.apache.org; user@mahout.apache.org
> > Subject: RE: MAHOUT 0.9 Release - New URL
> > Date: Wed, 22 Jan 2014 09:37:06 -0500
> >
> > will do!
> >
> > > Date: Wed, 22 Jan 2014 01:24:05 -0800
> > > From: suneel_marthi@yahoo.com
> > > Subject: Re: MAHOUT 0.9 Release - New URL
> > > To: dev@mahout.apache.org; user@mahout.apache.org
> > >
> > > Andrew M., Andrew P. and others,
> > >
> > > Sebastian and me fixed a few issues today (for 0.9):
> > >
> > > a) Removed asf-email-examples.sh script and few other scripts that
> should have been removed. Also removed references/invocations to algorithms
> that have been removed from the codebase.
> > > b) Fixed the issue with Streaming Kmeans clustering and checked in the
> code.
> > > c) Resurrected Frequent Pattern Mining implementation for 0.9.
> > >
> > > Please checkout the latest code from trunk, run a build locally and
> run thru the example scripts.
> > >
> > > Thanks and Regards.
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Wednesday, January 22, 2014 12:11 AM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
> > >
> > > *factorize-movielens-1M.sh:*
> > > RMSE is:
> > >
> > > 0.8519064098265133
> > >
> > >
> > > Sample recommendations:
> > >
> > > 2229
> > >
> [2197:4.4961276,527:4.4309845,2972:4.4180074,318:4.379484,572:4.312696,3092:4.249903]
> > > 5848
> > >
> [1900:4.6775646,3787:4.6623707,632:4.641377,2609:4.608225,3808:4.6058936,2998:4.6057487]
> > > 3728
> > >
> [572:4.951382,47:4.748921,874:4.6945343,1563:4.679901,3314:4.6621537,50:4.655838]
> > > 1252
> > >
> [53:5.0,3816:4.9664702,3077:4.9494777,213:4.94007,3808:4.9060082,978:4.8568053]
> > > 634
> > >
> [572:5.0,3092:4.779557,1872:4.72024,2687:4.629712,2125:4.615142,3853:4.5261393]
> > > 5516
>  [572:5.0,2197:5.0,3092:5.0,318:4.908213,356:4.885,3844:4.8237453]
> > > 2276    [1204:5.0,572:5.0,912:5.0,1250:5.0,1272:4.999891,1262:4.989652]
> > > 4219
> > >
> [53:4.8112006,598:4.775032,858:4.761604,572:4.7579737,1219:4.680987,1221:4.6604886]
> > > 91      [1198:5.0,2762:5.0,1207:5.0,1234:5.0,318:5.0,260:5.0]
> > > 502
> > >
> [953:5.0,260:4.9800477,1234:4.869403,1198:4.8527064,1207:4.8497486,3469:4.847286]
> > >
> > > factorize-netflix.sh:
> > > References a no-longer-available data set that Netflix took down after
> the
> > > competition; should at least mention that the data set is no longer
> > > "online" at least.
> > >
> > >
> > > On Tue, Jan 21, 2014 at 8:05 PM, Andrew Musselman <
> > > andrew.musselman@gmail.com> wrote:
> > >
> > > > *clustering-syntheticcontrol.sh*
> > > >
> > > > *Canopy:*
> > > > [snip]
> > > >         1.0 : [distance-squared=1740.681000315628]: [35.486, 25.600,
> > > > 29.914, 30.200, 27.654, 28.647, 26.582, 32.941, 34.728, 29.047,
> 34.706,
> > > > 34.764, 28.816, 30.271, 25.784, 25.035, 35.436, 29.075, 24.267,
> 24.625,
> > > > 28.162, 26.218, 28.773, 28.966, 28.802, 34.292, 26.211, 33.363,
> 32.920,
> > > > 31.891, 34.504, 32.686, 24.327, 35.981, 31.390, 10.832, 20.238,
> 10.051,
> > > > 14.877, 10.570, 19.603, 14.544, 10.667, 16.470, 19.007, 10.352,
> 13.473,
> > > > 12.196, 10.684, 16.620, 20.434, 17.069, 18.744, 9.599, 11.195,
> 12.002,
> > > > 10.017, 17.149, 14.850, 10.890]
> > > >         1.0 : [distance-squared=1455.363773097357]: [31.022, 28.140,
> > > > 26.730, 26.570, 29.561, 26.966, 28.049, 25.673, 33.721, 26.275,
> 30.410,
> > > > 31.101, 24.019, 35.659, 25.253, 25.932, 28.618, 32.423, 33.666,
> 33.745,
> > > > 35.118, 29.164, 25.477, 31.947, 35.491, 30.730, 25.820, 24.651,
> 25.528,
> > > > 31.343, 29.005, 31.825, 26.891, 28.194, 31.429, 16.935, 8.070,
> 16.604,
> > > > 14.743, 10.342, 8.155, 10.395, 17.689, 16.791, 14.138, 15.761, 6.787,
> > > > 13.062, 16.660, 15.021, 9.891, 9.216, 11.550, 8.877, 18.220, 9.477,
> 10.342,
> > > > 16.430, 11.898, 15.366]
> > > >         1.0 : [distance-squared=1679.9304895378882]: [29.625, 25.503,
> > > > 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873,
> 31.817,
> > > > 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472,
> 32.322,
> > > > 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231,
> 18.264,
> > > > 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463,
> 22.296,
> > > > 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025,
> 21.750,
> > > > 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310,
> 19.136,
> > > > 15.285, 22.528, 20.657, 24.129]
> > > >         1.0 : [distance-squared=2044.2887801683828]: [27.414, 25.397,
> > > > 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686,
> 27.511,
> > > > 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607,
> 33.519,
> > > > 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457,
> 35.025,
> > > > 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773,
> 11.549,
> > > > 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639,
> 17.236,
> > > > 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583,
> 14.118,
> > > > 20.229, 11.131, 9.980, 10.720]
> > > >         1.0 : [distance-squared=1385.3154063160764]: [35.899, 26.672,
> > > > 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587,
> 31.032,
> > > > 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037,
> 32.979,
> > > > 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016,
> 24.553,
> > > > 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533,
> 21.542,
> > > > 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836,
> 21.939,
> > > > 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534,
> 16.846,
> > > > 16.546, 15.927, 18.084, 17.475]
> > > >         1.0 : [distance-squared=1920.6376615603585]: [24.538, 24.280,
> > > > 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387,
> 31.301,
> > > > 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657,
> 25.295,
> > > > 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349,
> 18.137,
> > > > 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645,
> 19.457,
> > > > 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915,
> 13.762,
> > > > 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235,
> 12.042,
> > > > 19.310, 12.999, 17.460]
> > > >         1.0 : [distance-squared=2192.939571172661]: [34.335, 30.938,
> > > > 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089,
> 31.371,
> > > > 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650,
> 24.940,
> > > > 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634,
> 12.694,
> > > > 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729,
> 6.976,
> > > > 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> > > > 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> > > > 11.743, 11.699, 10.152]
> > > > Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> > > > INFO: Wrote 6 clusters
> > > > Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> > > > INFO: Program took 5827 ms (Minutes: 0.09711666666666667)
> > > >
> > > > *K-means:*
> > > > [snip]
> > > >         1.0 : [distance-squared=2873.881301031739]: [26.369, 37.791,
> > > > 41.839, 39.694, 36.728, 35.079, 30.668, 24.755, 20.610, 18.885,
> 15.459,
> > > > 14.074, 19.117, 34.230, 32.202, 40.715, 39.543, 37.193, 39.448,
> 30.829,
> > > > 25.273, 22.324, 19.801, 13.505, 17.462, 24.287, 27.398, 38.577,
> 42.108,
> > > > 41.947, 43.987, 41.331, 28.958, 23.664, 20.308, 22.244, 12.149,
> 15.768,
> > > > 18.964, 32.579, 33.202, 36.205, 42.364, 40.601, 35.415, 39.576,
> 33.145,
> > > > 19.936, 23.062, 19.053, 24.383, 19.611, 25.218, 38.860, 36.570,
> 38.964,
> > > > 36.946, 36.900, 32.593, 31.563]
> > > >         1.0 : [distance-squared=2525.8924241648783]: [35.389, 31.178,
> > > > 40.041, 43.034, 49.524, 40.942, 42.369, 30.153, 26.253, 23.178,
> 19.885,
> > > > 19.693, 21.837, 26.858, 23.533, 29.798, 43.401, 43.349, 48.238,
> 43.868,
> > > > 44.017, 28.056, 25.284, 22.737, 13.703, 14.023, 22.115, 23.720,
> 25.478,
> > > > 37.870, 37.868, 46.954, 43.135, 41.286, 37.269, 29.347, 24.312,
> 21.743,
> > > > 19.152, 21.668, 10.882, 16.147, 30.020, 28.472, 38.716, 44.620,
> 47.847,
> > > > 48.846, 39.361, 38.449, 33.105, 19.935, 14.961, 12.149, 12.630,
> 13.459,
> > > > 18.967, 35.473, 30.146, 45.527]
> > > >         1.0 : [distance-squared=2392.7171990886272]: [27.662, 37.199,
> > > > 39.158, 44.264, 46.473, 40.178, 38.728, 24.412, 25.155, 15.938,
> 13.125,
> > > > 16.867, 13.875, 29.969, 34.322, 40.870, 44.225, 47.594, 47.607,
> 44.751,
> > > > 39.813, 32.461, 16.646, 16.305, 21.256, 20.627, 19.267, 31.901,
> 34.995,
> > > > 43.643, 43.152, 47.125, 48.524, 33.131, 32.697, 20.300, 14.350,
> 18.330,
> > > > 12.155, 11.261, 19.701, 29.313, 30.457, 39.914, 49.543, 49.851,
> 46.445,
> > > > 39.561, 28.860, 22.486, 13.967, 10.006, 11.715, 15.945, 21.348,
> 25.439,
> > > > 41.024, 37.105, 45.623, 43.589]
> > > >         1.0 : [distance-squared=1419.8378244373016]: [25.784, 34.129,
> > > > 42.659, 37.176, 35.961, 34.307, 32.108, 29.749, 25.047, 17.455,
> 24.652,
> > > > 25.311, 22.995, 30.256, 25.955, 28.426, 34.556, 40.386, 39.642,
> 40.566,
> > > > 32.612, 34.091, 26.033, 25.668, 26.545, 17.338, 24.980, 20.134,
> 27.542,
> > > > 36.612, 31.855, 37.947, 39.736, 33.535, 36.607, 37.479, 32.612,
> 22.262,
> > > > 20.662, 16.124, 24.547, 27.686, 21.747, 27.198, 31.259, 40.569,
> 37.067,
> > > > 34.465, 34.730, 33.371, 23.060, 30.162, 22.022, 22.216, 14.812,
> 19.357,
> > > > 24.508, 34.432, 32.155, 34.839]
> > > >         1.0 : [distance-squared=4186.814512311335]: [25.870, 39.195,
> > > > 36.908, 47.052, 47.384, 40.741, 42.494, 30.282, 25.834, 17.650,
> 16.004,
> > > > 17.895, 13.321, 19.045, 27.440, 31.911, 39.208, 43.622, 41.567,
> 44.815,
> > > > 44.921, 35.422, 35.477, 23.190, 17.859, 14.684, 23.504, 23.141,
> 21.746,
> > > > 30.816, 31.361, 37.015, 38.094, 46.688, 47.681, 43.777, 39.652,
> 31.701,
> > > > 23.767, 22.265, 22.654, 22.327, 19.195, 21.163, 29.602, 27.563,
> 36.244,
> > > > 38.859, 44.234, 42.352, 42.160, 40.172, 30.094, 21.092, 25.193,
> 13.096,
> > > > 18.111, 14.754, 27.386, 27.026]
> > > >         1.0 : [distance-squared=1544.4011543572997]: [28.075, 41.784,
> > > > 42.120, 38.735, 44.320, 34.316, 32.212, 31.868, 24.301, 14.547,
> 17.178,
> > > > 22.279, 24.357, 31.011, 31.444, 34.837, 46.550, 48.301, 38.859,
> 42.363,
> > > > 35.657, 31.499, 19.794, 12.124, 15.371, 12.436, 15.763, 24.679,
> 32.597,
> > > > 43.004, 36.616, 38.935, 42.954, 34.957, 36.183, 28.177, 16.326,
> 14.988,
> > > > 10.680, 22.728, 24.075, 24.058, 36.616, 43.982, 39.198, 40.118,
> 40.078,
> > > > 34.752, 34.018, 23.750, 18.374, 12.251, 15.539, 18.699, 28.973,
> 24.044,
> > > > 39.404, 38.034, 46.458, 44.432]
> > > >         1.0 : [distance-squared=825.9338725427806]: [33.670, 38.675,
> > > > 39.742, 41.989, 37.291, 43.975, 31.909, 25.878, 31.080, 15.858,
> 13.950,
> > > > 23.097, 19.983, 21.692, 31.579, 38.570, 33.376, 38.843, 41.936,
> 33.534,
> > > > 39.195, 32.897, 25.343, 18.523, 15.089, 17.771, 22.614, 25.313,
> 23.687,
> > > > 29.010, 41.995, 35.712, 40.872, 41.669, 32.156, 25.162, 24.980,
> 23.705,
> > > > 18.413, 20.975, 14.906, 26.171, 30.165, 27.818, 35.083, 39.514,
> 37.851,
> > > > 33.967, 32.338, 34.977, 26.589, 28.079, 19.597, 24.669, 23.098,
> 25.685,
> > > > 28.215, 34.940, 36.910, 39.749]
> > > > Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> > > > INFO: Wrote 6 clusters
> > > > Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> > > > INFO: Program took 16902 ms (Minutes: 0.2817)
> > > >
> > > > *Fuzzy k-means:*
> > > > [snip]
> > > >         1.0 : [distance-squared=971.7369782121968]: [29.625, 25.503,
> > > > 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873,
> 31.817,
> > > > 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472,
> 32.322,
> > > > 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231,
> 18.264,
> > > > 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463,
> 22.296,
> > > > 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025,
> 21.750,
> > > > 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310,
> 19.136,
> > > > 15.285, 22.528, 20.657, 24.129]
> > > >         1.0 : [distance-squared=2054.618163154475]: [27.414, 25.397,
> > > > 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686,
> 27.511,
> > > > 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607,
> 33.519,
> > > > 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457,
> 35.025,
> > > > 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773,
> 11.549,
> > > > 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639,
> 17.236,
> > > > 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583,
> 14.118,
> > > > 20.229, 11.131, 9.980, 10.720]
> > > >         1.0 : [distance-squared=954.6503560728597]: [35.899, 26.672,
> > > > 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587,
> 31.032,
> > > > 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037,
> 32.979,
> > > > 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016,
> 24.553,
> > > > 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533,
> 21.542,
> > > > 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836,
> 21.939,
> > > > 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534,
> 16.846,
> > > > 16.546, 15.927, 18.084, 17.475]
> > > >         1.0 : [distance-squared=2817.9170498632957]: [24.538, 24.280,
> > > > 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387,
> 31.301,
> > > > 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657,
> 25.295,
> > > > 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349,
> 18.137,
> > > > 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645,
> 19.457,
> > > > 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915,
> 13.762,
> > > > 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235,
> 12.042,
> > > > 19.310, 12.999, 17.460]
> > > >         1.0 : [distance-squared=3472.3684696871424]: [34.335, 30.938,
> > > > 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089,
> 31.371,
> > > > 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650,
> 24.940,
> > > > 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634,
> 12.694,
> > > > 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729,
> 6.976,
> > > > 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> > > > 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> > > > 11.743, 11.699, 10.152]
> > > > Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> > > > INFO: Wrote 6 clusters
> > > > Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> > > > INFO: Program took 106615 ms (Minutes: 1.7769166666666667)
> > > >
> > > > *Dirichlet and Meanshift:*
> > > > Already detailed in M-1400, deprecated jobs still referenced.
> > > >
> > > >
> > > >
> > > > On Tue, Jan 21, 2014 at 6:20 PM, Andrew Musselman <
> > > > andrew.musselman@gmail.com> wrote:
> > > >
> > > >> *cluster-reuters.sh*
> > > >> *k-means:*
> > > >>
> > > >> [snip]
> > > >> :VL-19482{n=913 c=[0.06:0.011, 0.1:0.007, 0.13:0.010, 0.25:0.016,
> > > >> 0.38:0.020, 0.4:0.007, 0.5:0.032, 0
> > > >>         Top Terms:
> > > >>                 banks                                   =>
> > > >> 3.841823268955143
> > > >>                 bank                                    =>
> > > >>  3.80633066361209
> > > >>                 debt                                    =>
> > > >>  3.28065219870794
> > > >>                 said                                    =>
> > > >>  2.5965700942088583
> > > >>                 he                                      =>
> > > >> 2.335682813857497
> > > >>                 foreign                                 =>
> > > >>  2.2217853688201403
> > > >>                 billion                                 =>
> > > >>  2.1970193848291335
> > > >>                 would                                   =>
> > > >>  1.9932392063955617
> > > >>                 loans                                   =>
> > > >>  1.9309276792854233
> > > >>                 interest                                =>
> > > >>  1.787324501938
> > > >>                 have                                    =>
> > > >> 1.762981951432578
> > > >>                 its                                     =>
> > > >>  1.7615109954971866
> > > >>                 which                                   =>
> > > >>  1.5822081148036862
> > > >>                 has                                     =>
> > > >>  1.5600708189041956
> > > >>                 dlrs                                    =>
> > > >>  1.5571038313005996
> > > >>                 finance                                 =>
> > > >>  1.5539758811252924
> > > >>                 new                                     =>
> > > >>  1.5176015811577555
> > > >>                 had                                     =>
> > > >>  1.5138723701401844
> > > >>                 brazil                                  =>
> > > >>  1.5083369853593172
> > > >>                 payments                                =>
> > > >>  1.4539044255886517
> > > >>         Weight : [props - optional]:  Point:
> > > >>
> > > >> :VL-7320{n=2726 c=[0:0.003, 0.1:0.010, 0.2:0.007, 0.3:0.009,
> 0.4:0.007,
> > > >> 0.40:0.003, 0.5:0.009, 0.57:0
> > > >>         Top Terms:
> > > >>                 vs                                      =>
> > > >> 6.126130791333171
> > > >>                 net                                     =>
> > > >> 4.012191567277523
> > > >>                 cts                                     =>
> > > >> 3.822006848832744
> > > >>                 shr                                     =>
> > > >>  3.6786004856764527
> > > >>                 mln                                     =>
> > > >>  2.9011643584038698
> > > >>                 loss                                    =>
> > > >> 2.788368861463607
> > > >>                 qtr                                     =>
> > > >> 2.714140225051522
> > > >>                 revs                                    =>
> > > >>  2.4739861236454717
> > > >>                 profit                                  =>
> > > >>  1.8146888090247015
> > > >>                 note                                    =>
> > > >>  1.7977163272138388
> > > >>                 dlrs                                    =>
> > > >>  1.6164390808155846
> > > >>                 avg                                     =>
> > > >>  1.3901765773336587
> > > >>                 shrs                                    =>
> > > >>  1.3856326531419314
> > > >>                 mths                                    =>
> > > >>  1.3168717272038506
> > > >>                 4th                                     =>
> > > >>  1.2161158425617289
> > > >>                 oper                                    =>
> > > >> 1.182419473776814
> > > >>                 year                                    =>
> > > >> 1.178086061733047
> > > >>                 nine                                    =>
> > > >>  1.0670554836445316
> > > >>                 3rd                                     =>
> > > >> 1.041334410056592
> > > >>                 inc                                     =>
> > > >>  1.0019361981554935
> > > >>         Weight : [props - optional]:  Point:
> > > >>
> > > >>
> > > >> Inter-Cluster Density: 0.45562152681859414
> > > >> Intra-Cluster Density: 0.6952712632167628
> > > >> CDbw Inter-Cluster Density: 0.0
> > > >> CDbw Intra-Cluster Density: 16.486930227598684
> > > >> CDbw Separation: 194.49005884464628
> > > >>
> > > >> *fuzzy k-means:*
> > > >> :SV-18539{n=1039 c=[0:0.026, 0.003:0.001, 0.006913:0.001,
> 0.007050:0.001,
> > > >> 0.01:0.005, 0.02:0.002, 0.0
> > > >>         Top Terms:
> > > >>                 said                                    =>
> > > >>  1.8665592354713065
> > > >>                 its                                     =>
> > > >>  1.1335212213411592
> > > >>                 pct                                     =>
> > > >>  1.0862816801353348
> > > >>                 dlrs                                    =>
> > > >>  1.0854998884993752
> > > >>                 mln                                     =>
> > > >> 1.043163996400643
> > > >>                 from                                    =>
> > > >>  0.9684961110525736
> > > >>                 has                                     =>
> > > >> 0.912161511978058
> > > >>                 company                                 =>
> > > >>  0.8754186972808333
> > > >>                 mar                                     =>
> > > >>  0.8675333452422878
> > > >>                 inc                                     =>
> > > >>  0.7678617590362815
> > > >>                 would                                   =>
> > > >>  0.7610968883652675
> > > >>                 he                                      =>
> > > >>  0.7459988770503974
> > > >>                 which                                   =>
> > > >>  0.7435613119406804
> > > >>                 year                                    =>
> > > >>  0.7302840632748394
> > > >>                 u.s                                     =>
> > > >>  0.7281061062439116
> > > >>                 shares                                  =>
> > > >>  0.7260764102983083
> > > >>                 corp                                    =>
> > > >>  0.7179807367808658
> > > >>                 new                                     =>
> > > >>  0.7044203783157115
> > > >>                 stock                                   =>
> > > >>  0.6962010978721442
> > > >>                 have                                    =>
> > > >>  0.6464265467298506
> > > >> :SV-9431{n=1034 c=[0:0.023, 0.003:0.001, 0.006913:0.001,
> 0.007050:0.001,
> > > >> 0.01:0.004, 0.02:0.002, 0.02
> > > >>         Top Terms:
> > > >>                 said                                    =>
> > > >> 1.864911184196927
> > > >>                 dlrs                                    =>
> > > >> 1.199286689822081
> > > >>                 mln                                     =>
> > > >>  1.1802134783562215
> > > >>                 pct                                     =>
> > > >>  1.1529704214798124
> > > >>                 its                                     =>
> > > >>  1.1184398851519701
> > > >>                 from                                    =>
> > > >> 1.016647848050332
> > > >>                 company                                 =>
> > > >> 0.894703604722841
> > > >>                 mar                                     =>
> > > >> 0.879986159541356
> > > >>                 has                                     =>
> > > >>  0.8642799128491316
> > > >>                 year                                    =>
> > > >>  0.8271823503717782
> > > >>                 inc                                     =>
> > > >>  0.7871293745341424
> > > >>                 corp                                    =>
> > > >> 0.737705498468879
> > > >>                 which                                   =>
> > > >> 0.722975201852743
> > > >>                 would                                   =>
> > > >> 0.708000816484415
> > > >>                 u.s                                     =>
> > > >>  0.7073294276173905
> > > >>                 billion                                 =>
> > > >>  0.7055723996916351
> > > >>                 he                                      =>
> > > >>  0.7042684217823294
> > > >>                 new                                     =>
> > > >>  0.6834737905434939
> > > >>                 shares                                  =>
> > > >>  0.6753327384172428
> > > >>                 stock                                   =>
> > > >>  0.6576225144041699
> > > >> :SV-4785{n=1044 c=[0:0.023, 0.003:0.001, 0.006913:0.001,
> 0.007050:0.001,
> > > >> 0.01:0.006, 0.02:0.002, 0.02
> > > >>         Top Terms:
> > > >>                 said                                    =>
> > > >>  1.8796076179735086
> > > >>                 its                                     =>
> > > >> 1.172025965452378
> > > >>                 dlrs                                    =>
> > > >> 1.130422792460914
> > > >>                 pct                                     =>
> > > >> 1.082038255241358
> > > >>                 mln                                     =>
> > > >>  1.0772146872767114
> > > >>                 company                                 =>
> > > >>  0.9662235879639138
> > > >>                 from                                    =>
> > > >>  0.9473172871605616
> > > >>                 has                                     =>
> > > >>  0.9224712965830099
> > > >>                 mar                                     =>
> > > >>  0.8769325856924421
> > > >>                 inc                                     =>
> > > >>  0.8360245257169788
> > > >>                 shares                                  =>
> > > >>  0.8334595641384324
> > > >>                 stock                                   =>
> > > >>  0.7704621839612175
> > > >>                 corp                                    =>
> > > >>  0.7682400250301806
> > > >>                 which                                   =>
> > > >>  0.7389988207856137
> > > >>                 would                                   =>
> > > >>  0.7339708917389389
> > > >>                 year                                    =>
> > > >>  0.7088414843731325
> > > >>                 new                                     =>
> > > >>  0.7038109468655172
> > > >>                 he                                      =>
> > > >>  0.6993994455501005
> > > >>                 u.s                                     =>
> > > >>  0.6772649147622415
> > > >>                 share                                   =>
> > > >>  0.6241804830055171
> > > >>
> > > >> *lda:*
> > > >>
> > > >> [snip]
> > > >> 21539
> > > >>
> {0.02:0.7071698119320297,0.01:0.09185661419250732,0.055:0.05559716236030052,0:0.04416624482186779,0.046:0.04110903741109505,0.10:0.036664417789754995,0.073:0.009543544335363714,0.006913:0.0050293341354450535,0.007050:0.004031353935388081,0.003:0.0019163134919350053}
> > > >> 21540
> > > >>
> {0.04:0.4289840457590362,0.006913:0.3764552842292379,0.03:0.14201700033942147,0.025:0.031492533562460345,0.01:0.0057428959027208,0.0625:0.005563615996288134,0.007050:0.004410121345864598,0.02:0.003315679992182833,0.06:0.0010680133665222197,0.057:2.6908116668663575E-4}
> > > >> 21541
> > > >>
> {0:0.6323790020346274,0.02:0.282038438551802,0.06:0.05929917592183956,0.046:0.021922159161082488,0.025:0.003123559949176154,0.1:6.163861804777446E-4,0.057:1.1513607281652563E-4,0.077:7.546641269535658E-5,0.05:6.020069105221075E-5,0.04:5.727546417602751E-5}
> > > >> 21542
> > > >>
> {0.06:0.7916202902993545,0.003:0.12474538653424426,0.03:0.0516187501990535,0.073:0.013078206873952192,0.077:0.006874558859394474,0.1:0.004726257215175674,0.07:0.0029954110394770084,0.02:0.0015535221634909381,0.0625:8.057802584933225E-4,0.006913:5.349608183182145E-4}
> > > >> 21543
> > > >>
> {0.03:0.8557703918728101,0.007050:0.0884696514495358,0.01:0.02386217742025169,0.055:0.01794750983209314,0.046:0.007244240662980594,0.02:0.0025021610305199344,0.1:0.0011795344808501283,0.0625:9.555718731181631E-4,0.077:8.384080940465597E-4,0.003:2.4480111447900804E-4}
> > > >> 21544
> > > >>
> {0.006913:0.6497351267772595,0.03:0.2586364130498913,0.003:0.0845308091084703,0.05:0.003488921831506818,0.0625:0.0011991059579690732,0.06:5.301659535652308E-4,0.055:3.152013670552699E-4,0.025:3.11949418681945E-4,0.02:2.2346509541410544E-4,0.057:1.727741085612565E-4}
> > > >> 21545
> > > >>
> {0.07:0.24937001680778675,0.03:0.22854680579732564,0.1:0.22068070811382484,0.05:0.21891358916881656,0.06:0.04971205462818302,0.046:0.029384231252419384,0.025:0.0014118858333346275,0.0625:0.0012321257282372393,0.073:3.1606402052550454E-4,0.04:1.1662985389192366E-4}
> > > >> 21546
> > > >>
> {0.006913:0.652435612545713,0.073:0.24282539253890825,0.003:0.055020663101050894,0.046:0.01978650831133704,0.04:0.011699600233998459,0.02:0.006822518565048025,0.05:0.0042733514779345234,0.025:0.0020768247329981497,0.03:0.0015466188524926666,0:0.0010433788491090747}
> > > >> 21547
> > > >>
> {0.1:0.935072008706917,0.077:0.040616382672055494,0.046:0.023138054104883225,0.06:3.42252302356326E-4,0.03:2.672247473681925E-4,0.007050:9.004419280457053E-5,0.006913:7.844271097106069E-5,0.04:7.198835322717653E-5,0.073:5.8017696474244885E-5,0.02:4.474316852571109E-5}
> > > >> 21548
> > > >>
> {0.006913:0.4510141547325999,0.03:0.21472666934984877,0.07:0.15060570743672352,0.046:0.08216181853028293,0.05:0.07498297963542139,0.077:0.01401215532342401,0.04:0.006135722806477439,0.073:0.0031124469556872442,0.02:0.001327252389650958,0.025:4.5167404646311124E-4}
> > > >> 21549
> > > >>
> {0.077:0.5249260290096315,0:0.27520186965742544,0.073:0.08959570186504386,0.057:0.05063809804337512,0.02:0.045313417578343,0.03:0.00856024297303885,0.06:0.0034039136814002697,0.07:8.611323331226122E-4,0.05:3.9387255234958607E-4,0.04:3.454752498776842E-4}
> > > >> 21550
> > > >>
> {0.077:0.8319708524327014,0.007050:0.16001430652201168,0.0625:0.0024884528530674226,0.03:0.0024228964094551416,0.006913:0.0010048446080994332,0.025:5.974607168723063E-4,0.06:2.7741593377792194E-4,0.057:2.2975316663437597E-4,0.073:2.1769059479546598E-4,0.01:1.4945234676270913E-4}
> > > >> 21551
> > > >>
> {0:0.43298549949426596,0.07:0.30407459462158964,0.0625:0.15744077685124136,0.01:0.06385977925647884,0.025:0.020205557109867888,0.04:0.013226123091680062,0.055:0.006501562964287696,0.073:5.871897881404074E-4,0.003:4.554836644848264E-4,0.05:2.0489216962221778E-4}
> > > >> 21552
> > > >>
> {0.06:0.7383744333942458,0.02:0.07321126682102753,0.07:0.05910378841288311,0.10:0.056907223730939045,0:0.02739559786902668,0.055:0.02228913751272657,0.1:0.00943274247398869,0.073:0.007301445750018608,0.03:0.0027711985062277246,0.046:0.0022569760697531112}
> > > >> 21553
> > > >>
> {0.006913:0.9383779606743132,0.07:0.018356074893823605,0.1:0.017572502072403694,0.025:0.011031848455352145,0.02:0.007110054905474798,0.10:0.0032390933761145377,0.01:0.0022676909091530165,0.06:6.35712654913703E-4,0.003:3.037608224814005E-4,0.0625:2.569989005512836E-4}
> > > >> 21554
> > > >>
> {0.02:0.4740260172915081,0.03:0.3264871353578654,0.003:0.12383071192450323,0:0.0387913487693229,0.046:0.013922363892934853,0.01:0.01115832101415319,0.055:0.004438529958216764,0.006913:0.0031112721201723413,0.025:0.002399946628810436,0.073:6.131643125523258E-4}
> > > >> 21555
> > > >>
> {0.01:0.42396888624053325,0.057:0.2876207419392007,0.055:0.13436747721404915,0.10:0.0923627978966975,0.05:0.030530372962547347,0.003:0.008404234144369286,0.006913:0.007605013189108045,0.007050:0.005108825532219537,0.0625:0.004651573366090424,0.025:0.002180181204608079}
> > > >> 21556
> > > >>
> {0.02:0.9634685201058687,0.077:0.02816791882994464,0.003:0.007368001305747207,0.01:4.6299664334060455E-4,0.055:6.358811991095171E-5,0.06:5.5240076581247115E-5,0.03:5.231350505975146E-5,0.073:4.4773553407989E-5,0:3.6810457882911134E-5,0.006913:3.469265861451538E-5}
> > > >> 21557
> > > >>
> {0.06:0.9993947051352264,0.05:2.53296407549323E-4,0.03:9.647254237540585E-5,0.04:4.3439270433017595E-5,0.006913:3.354681491698619E-5,0.046:2.8586727566416525E-5,0.007050:2.5584606074089293E-5,0.02:2.2477243875316502E-5,0:1.9194783598477564E-5,0.073:1.6900267481892075E-5}
> > > >> 21558
> > > >>
> {0.06:0.9985073788555696,0.03:8.033091100049725E-4,0.02:1.1953766739610202E-4,0.1:9.257610652171745E-5,0.006913:6.903783269939522E-5,0.04:5.7960967280609926E-5,0.07:5.5767050956214925E-5,0.046:4.857027337508731E-5,0.007050:3.5964741541364354E-5,0.057:3.538185773175377E-5}
> > > >> 21559
> > > >>
> {0.006913:0.5411158302162348,0.073:0.10595054605908563,0.04:0.10013413310674449,0.007050:0.08891690362990352,0.003:0.08028744789933502,0.03:0.07779866329563544,0.025:0.0024314950846438975,0.0625:0.0018111845757907532,0.077:5.019763638644379E-4,0.055:2.6705221435486376E-4}
> > > >> 21560
> > > >>
> {0.06:0.9978884964462115,0.03:9.139179655096763E-4,0.02:6.066525825847198E-4,0.04:3.2209695809936266E-4,0.006913:7.970816960964983E-5,0.007050:5.58984012266038E-5,0.046:1.545206468939752E-5,0.077:1.5319891034946036E-5,0:1.5223193391279898E-5,0.073:1.1687759018690983E-5}
> > > >> 21561
> > > >>
> {0.06:0.8704787441462888,0.007050:0.09131090501970876,0.0625:0.03778566273654969,0.046:8.111987924033587E-5,0.01:6.397000919080148E-5,0.1:4.979138875277178E-5,0.073:2.7942853854174412E-5,0.03:2.762357950066146E-5,0.077:2.6820742114533983E-5,0.025:2.6208487691114472E-5}
> > > >> 21562
> > > >>
> {0.06:0.9625603310705717,0.02:0.01708933716171641,0.007050:0.011374975845817934,0.07:0.00482114999912725,0.057:0.003310531318631415,0.077:2.568457462720504E-4,0.025:1.3439876933758153E-4,0.1:1.0512433283405881E-4,0:8.737443941062146E-5,0.046:5.9284232059258864E-5}
> > > >> 21563
> > > >>
> {0.06:0.9996809273982157,0.046:1.6325482120709976E-4,0.02:2.1084598024339765E-5,0.006913:1.7901242019979392E-5,0.04:1.3991162886383784E-5,0.03:1.3682157013015017E-5,0.077:1.3602492456590212E-5,0.007050:1.206712606141859E-5,0.1:1.0231842284246997E-5,0.057:7.704725064510759E-6}
> > > >> 21564
> > > >>
> {0.06:0.9997861963976675,0.03:2.7140899197995276E-5,0.077:2.4456158311423386E-5,0.04:1.7952818233846462E-5,0.02:1.6334455815684533E-5,0.006913:1.4910843270220926E-5,0.073:1.4893553937733922E-5,0.1:1.2545291899719683E-5,0.007050:1.2337617858874285E-5,0.07:1.121302251254011E-5}
> > > >> 21565
> > > >>
> {0.006913:0.5047832315246878,0.007050:0.2502505818382197,0.04:0.09937533960784072,0.03:0.05332716291468396,0.0625:0.035738976624857435,0.05:0.023139962103851885,0.1:0.01510786357969295,0.025:0.01294816540331917,0.06:0.003589347327961106,0.073:4.8607269023994543E-4}
> > > >> 21566
> > > >>
> {0.03:0.865579490292393,0.073:0.065013560785593,0.077:0.056622094108767465,0.046:0.006811842330071251,0.057:0.0021561477140846267,0.01:0.0013149375957061502,0.04:7.456782721333958E-4,0.05:6.412995789267404E-4,0:2.107021879325011E-4,0.007050:1.8121393989190674E-4}
> > > >> 21567
> > > >>
> {0.077:0.6528663315309344,0.03:0.2794089480653573,0.025:0.060214953606503134,0.003:0.0029851203917978303,0.01:0.002840784719750811,0.007050:5.508511345707982E-4,0.02:3.3395642786457786E-4,0.07:1.7452781529689483E-4,0.055:1.541344869853217E-4,0.046:1.0154945247629696E-4}
> > > >> 21568
> > > >>
> {0.057:0.7925855379581803,0.03:0.2036907633660934,0.06:0.002283924010657722,0.046:0.0011125177332923534,0.10:6.706442563331911E-5,0.02:3.9070227131596934E-5,0.07:3.266094677087569E-5,0.1:2.3436639268605713E-5,0.077:2.318067689954084E-5,0.006913:2.273173553155518E-5}
> > > >> 21569
> > > >>
> {0.06:0.7223525910216753,0.1:0.22280373045161775,0.04:0.04239924319412595,0.02:0.006529556700876843,0.007050:0.004661124794787862,0.10:2.017974034648702E-4,0.05:2.017516118028694E-4,0.025:1.583677755896652E-4,0.006913:1.1939663934259253E-4,0:8.736457986006156E-5}
> > > >> 21570
> > > >>
> {0.073:0.5321098304788365,0.006913:0.3897385574581158,0.02:0.051125703897749404,0.077:0.012417023254098358,0.01:0.006491518762040415,0.03:0.005434610908750246,0.055:9.352347087701305E-4,0.06:5.239363525283659E-4,0.10:4.4114444135088393E-4,0.003:2.2360302221231853E-4}
> > > >> 21571
> > > >>
> {0.06:0.9074255414695478,0.05:0.08971808812931319,0.02:0.0019124104766371694,0.1:2.581234320438502E-4,0.073:1.9041498001195312E-4,0.046:8.348355382104383E-5,0.006913:5.748062878632201E-5,0.057:4.8979196235823963E-5,0.04:4.4005272512340306E-5,0.03:4.1089625609562384E-5}
> > > >> 21572
> > > >>
> {0.06:0.7216888928389846,0.04:0.1955684645266304,0.006913:0.08235454447065854,0.1:9.82899034505965E-5,0.046:4.9184577303445956E-5,0.05:4.021965070167039E-5,0.007050:3.081280652634891E-5,0.073:2.791145099471127E-5,0.02:1.95681856192452E-5,0:1.9037190007395713E-5}
> > > >> 21573
> > > >>
> {0.05:0.8568971411565196,0.046:0.12909436895238377,0.06:0.012704234652048044,0.04:2.472784666357729E-4,0.055:2.1886698996361582E-4,0.1:1.1974451444757112E-4,0.0625:9.082357988309755E-5,0.07:9.030017229129562E-5,0.03:7.268045692763623E-5,0.073:6.607926928741721E-5}
> > > >> 21574
> > > >>
> {0.046:0.5619466458628039,0.006913:0.184782367089353,0.0625:0.09726566772972363,0.003:0.09534816862353344,0.02:0.015692927163565275,0.073:0.015492017672231727,0.01:0.01411218625979968,0.007050:0.01037341031640615,0.055:0.0020124341216292752,0:0.0013828685922332715}
> > > >> 21575
> > > >>
> {0.05:0.3167681189235041,0.06:0.2879333280436204,0.046:0.22584628506521745,0.003:0.15994241628395953,0.0625:0.006482994028630967,0.1:0.0016773104050919493,0.055:4.8725015996676173E-4,0.03:2.432893903382962E-4,0.025:1.6015914759364425E-4,0.006913:1.0021292427951807E-4}
> > > >> 21576
> > > >>
> {0.077:0.2407816981967022,0.003:0.18594668103110193,0.10:0.14951775492012523,0.0625:0.09241152906714677,0.007050:0.08997645163280943,0.057:0.07102865286733068,0.055:0.048046579920457584,0.05:0.03776387140040494,0.073:0.033355081179026046,0.006913:0.015980584385115525}
> > > >> 21577
> > > >>
> {0.06:0.9599074612361259,0.1:0.03694135499501186,0.04:0.0021941315048273186,0.057:4.7070027358666304E-4,0.02:1.8231606308803002E-4,0.0625:6.234942557920162E-5,0.006913:5.159011604129561E-5,0.046:3.9832812943124216E-5,0.007050:3.0380683715134534E-5,0.05:2.8920851352755496E-5}
> > > >>
> > > >>
> > > >> *Streaming k-means:*
> > > >>
> > > >> [snip]
> > > >> INFO: Number of Centroids: 0
> > > >> Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job
> run
> > > >> WARNING: job_local23982482_0001
> > > >> java.lang.IllegalArgumentException: Must have nonzero number of
> training
> > > >> and test vectors. Asked for %.1f %% of %d vectors for test
> > > >> [10.000000149011612, 0]
> > > >>         at
> > > >>
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
> > > >>         at
> > > >>
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
> > > >>         at
> > > >>
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
> > > >>         at
> > > >>
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
> > > >>         at
> > > >>
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
> > > >>         at
> > > >>
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
> > > >>         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
> > > >>         at
> > > >>
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
> > > >>         at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
> > > >>         at
> > > >>
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
> > > >>
> > > >> [snip]
> > > >>
> > > >> WARNING: No qualcluster.props found on classpath, will use
> command-line
> > > >> arguments only
> > > >> Num clusters: 0; maxDistance: 0.000000
> > > >> [Dunn Index] First: Infinity
> > > >> [Davies-Bouldin Index] First: NaN
> > > >> Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
> > > >> INFO: Program took 535 ms (Minutes: 0.008916666666666666)
> > > >> cluster,distance.mean,distance.sd
> > > >>
> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > > >>
> > > >>
> > > >> On Tue, Jan 21, 2014 at 1:47 PM, Andrew Musselman <
> > > >> andrew.musselman@gmail.com> wrote:
> > > >>
> > > >>> *classify-20newsgroups.sh*
> > > >>>
> > > >>> *Complementary naive bayes:*
> > > >>> =======================================================
> > > >>> Summary
> > > >>> -------------------------------------------------------
> > > >>> Correctly Classified Instances          :      11207       98.9406%
> > > >>> Incorrectly Classified Instances        :        120        1.0594%
> > > >>> Total Classified Instances              :      11327
> > > >>>
> > > >>> =======================================================
> > > >>> Confusion Matrix
> > > >>> -------------------------------------------------------
> > > >>> a       b       c       d       e       f       g       h       i
> > > >>> j       k       l       m       n       o       p       q       r
>       s
> > > >>>    t        <--Classified as
> > > >>> 475     0       0       1       0       0       0       0       0
> > > >>> 0       0       0       0       0       1       0       1       0
>       0
> > > >>>    0         |  478         a     = alt.atheism
> > > >>> 0       597     1       1       0       1       1       0       0
> > > >>> 0       0       1       0       2       1       0       0       0
>       0
> > > >>>    0         |  605         b     = comp.graphics
> > > >>> 0       1       620     3       0       1       0       0       0
> > > >>> 0       0       1       0       0       1       0       0       0
>       0
> > > >>>    0         |  627         c     = comp.os.ms-windows.misc
> > > >>> 1       1       1       593     2       0       0       0       0
> > > >>> 0       0       0       0       0       0       1       0       0
>       0
> > > >>>    0         |  599         d     = comp.sys.ibm.pc.hardware
> > > >>> 0       1       1       0       568     0       1       0       0
> > > >>> 0       1       1       2       0       0       0       0       1
>       0
> > > >>>    0         |  576         e     = comp.sys.mac.hardware
> > > >>> 0       4       2       0       0       581     0       0       0
> > > >>> 0       0       0       0       0       0       0       0       0
>       0
> > > >>>    0         |  587         f     = comp.windows.x
> > > >>> 0       0       0       1       2       0       571     3       0
> > > >>> 0       1       1       4       1       0       0       0       0
>       0
> > > >>>    0         |  584         g     = misc.forsale
> > > >>> 0       0       0       1       0       0       0       589     1
> > > >>> 0       0       1       1       0       0       0       0       0
>       0
> > > >>>    0         |  593         h     = rec.autos
> > > >>> 0       0       0       0       0       0       0       1       565
> > > >>> 0       0       0       0       0       1       0       0       0
>       0
> > > >>>    0         |  567         i     = rec.motorcycles
> > > >>> 0       0       0       0       0       0       0       0       0
> > > >>> 600     2       0       0       0       1       0       0       0
>       0
> > > >>>    0         |  603         j     = rec.sport.baseball
> > > >>> 0       0       0       0       0       0       0       0       0
> > > >>> 1       584     0       0       0       0       0       0       0
>       0
> > > >>>    0         |  585         k     = rec.sport.hockey
> > > >>> 0       0       0       0       0       0       0       0       0
> > > >>> 0       0       579     0       0       0       0       0       1
>       0
> > > >>>    0         |  580         l     = sci.crypt
> > > >>> 0       0       0       1       3       0       2       0       0
> > > >>> 2       0       0       567     1       2       1       0       0
>       0
> > > >>>    0         |  579         m     = sci.electronics
> > > >>> 0       0       0       0       0       0       0       0       0
> > > >>> 0       0       0       1       605     0       0       0       0
>       0
> > > >>>    0         |  606         n     = sci.med
> > > >>> 0       0       0       0       0       0       0       0       0
> > > >>> 0       0       0       0       0       602     0       0       0
>       0
> > > >>>    0         |  602         o     = sci.space
> > > >>> 0       0       0       0       0       0       0       0       0
> > > >>> 0       0       0       0       1       0       602     0       0
>       1
> > > >>>    0         |  604         p     = soc.religion.christian
> > > >>> 0       0       0       0       0       0       0       0       0
> > > >>> 0       0       0       0       0       0       0       556     0
>       0
> > > >>>    0         |  556         q     = talk.politics.mideast
> > > >>> 0       0       1       0       0       0       0       0       0
> > > >>> 0       0       1       0       0       1       0       0
> 568     0
> > > >>>    0         |  571         r     = talk.politics.guns
> > > >>> 11      0       0       0       0       0       0       0       0
> > > >>> 1       0       0       0       1       3       8       1       4
>       338
> > > >>>    2         |  369         s     = talk.religion.misc
> > > >>> 0       0       0       0       0       0       0       0       0
> > > >>> 0       1       0       0       0       1       0       3       4
>       0
> > > >>>    447       |  456         t     = talk.politics.misc
> > > >>>
> > > >>> =======================================================
> > > >>> Statistics
> > > >>> -------------------------------------------------------
> > > >>> Kappa                                       0.9806
> > > >>> Accuracy                                   98.9406%
> > > >>> Reliability                                94.0932%
> > > >>> Reliability (standard deviation)            0.2163
> > > >>>
> > > >>> Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
> > > >>> INFO: Program took 15870 ms (Minutes: 0.2645)
> > > >>> + echo 'Testing on holdout set'
> > > >>> Testing on holdout set
> > > >>> + ./bin/mahout testnb -i
> /tmp/mahout-work-ec2-user/20news-test-vectors
> > > >>> -m /tmp/mahout-work-ec2-user/model -l
> /tmp/mahout-work-ec2-user/labelindex
> > > >>> -ow -o /tmp/mahout-work-ec2-user/20news-testing -c
> > > >>>
> > > >>> [snip]
> > > >>>
> > > >>> INFO: Complementary Results:
> > > >>> =======================================================
> > > >>> Summary
> > > >>> -------------------------------------------------------
> > > >>> Correctly Classified Instances          :       6715       89.3071%
> > > >>> Incorrectly Classified Instances        :        804       10.6929%
> > > >>> Total Classified Instances              :       7519
> > > >>>
> > > >>> =======================================================
> > > >>> Confusion Matrix
> > > >>> -------------------------------------------------------
> > > >>> a       b       c       d       e       f       g       h       i
> > > >>> j       k       l       m       n       o       p       q       r
>       s
> > > >>>    t        <--Classified as
> > > >>> 298     0       0       0       0       0       0       0       0
> > > >>> 1       0       0       0       1       2       5       1       0
>       13
> > > >>>     0         |  321         a     = alt.atheism
> > > >>> 0       298     11      6       1       12      2       2       1
> > > >>> 1       3       8       3       4       2       4       1       4
>       4
> > > >>>    1         |  368         b     = comp.graphics
> > > >>> 1       17      286     16      4       9       6       3       2
> > > >>> 0       1       0       1       7       1       0       2       1
>       0
> > > >>>    1         |  358         c     = comp.os.ms-windows.misc
> > > >>> 2       6       11      309     9       5       14      8       1
> > > >>> 0       2       0       6       4       2       0       1       2
>       1
> > > >>>    0         |  383         d     = comp.sys.ibm.pc.hardware
> > > >>> 0       10      8       7       334     7       5       5       2
> > > >>> 0       3       0       2       1       1       0       1       1
>       0
> > > >>>    0         |  387         e     = comp.sys.mac.hardware
> > > >>> 1       13      7       8       2       355     2       0       2
> > > >>> 0       0       5       1       1       3       0       0       1
>       0
> > > >>>    0         |  401         f     = comp.windows.x
> > > >>> 0       7       11      29      12      9       268     16      8
> > > >>> 4       3       2       6       4       2       1       3       1
>       2
> > > >>>    3         |  391         g     = misc.forsale
> > > >>> 0       1       0       0       3       0       7       362     8
> > > >>> 2       2       1       2       0       2       0       1       2
>       0
> > > >>>    4         |  397         h     = rec.autos
> > > >>> 0       0       0       1       0       0       1       0       423
> > > >>> 0       0       0       2       1       0       1       0       0
>       0
> > > >>>    0         |  429         i     = rec.motorcycles
> > > >>> 0       0       1       0       0       0       0       2       2
> > > >>> 371     8       0       2       3       0       2       0       0
>       0
> > > >>>    0         |  391         j     = rec.sport.baseball
> > > >>> 0       0       1       0       0       0       1       0       0
> > > >>> 2       409     0       0       0       0       0       0       0
>       0
> > > >>>    1         |  414         k     = rec.sport.hockey
> > > >>> 0       0       1       2       1       0       1       0       0
> > > >>> 0       0       404     0       0       0       0       0       1
>       0
> > > >>>    1         |  411         l     = sci.crypt
> > > >>> 0       5       4       11      1       3       7       9       2
> > > >>> 5       3       3       339     2       6       0       1       1
>       2
> > > >>>    1         |  405         m     = sci.electronics
> > > >>> 0       4       0       1       0       0       0       1       0
> > > >>> 1       1       0       3       367     3       1       2       0
>       0
> > > >>>    0         |  384         n     = sci.med
> > > >>> 0       1       2       0       1       0       2       0       0
> > > >>> 1       0       0       1       1       375     0       1       0
>       0
> > > >>>    0         |  385         o     = sci.space
> > > >>> 4       2       1       1       0       0       1       1       2
> > > >>> 0       0       1       1       5       1       367     4       0
>       1
> > > >>>    1         |  393         p     = soc.religion.christian
> > > >>> 0       1       0       0       0       0       0       0       0
> > > >>> 2       0       0       0       0       0       2       378     0
>       1
> > > >>>    0         |  384         q     = talk.politics.mideast
> > > >>> 0       0       0       0       0       2       1       1       1
> > > >>> 1       0       3       0       3       0       0       2
> 319     2
> > > >>>    4         |  339         r     = talk.politics.guns
> > > >>> 32      0       0       1       0       0       0       0       0
> > > >>> 1       1       1       0       2       2       26      5       7
>       175
> > > >>>    6         |  259         s     = talk.religion.misc
> > > >>> 0       0       0       2       0       0       0       0       0
> > > >>> 1       2       2       0       1       2       1       10      18
>      2
> > > >>>    278       |  319         t     = talk.politics.misc
> > > >>>
> > > >>> =======================================================
> > > >>> Statistics
> > > >>> -------------------------------------------------------
> > > >>> Kappa                                       0.8594
> > > >>> Accuracy                                   89.3071%
> > > >>> Reliability                                 84.611%
> > > >>> Reliability (standard deviation)            0.2148
> > > >>>
> > > >>> Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
> > > >>> INFO: Program took 10840 ms (Minutes: 0.18066666666666667)
> > > >>>
> > > >>>
> > > >>> *Naive bayes:*
> > > >>> INFO: Standard NB Results:
> > > >>> =======================================================
> > > >>> Summary
> > > >>> -------------------------------------------------------
> > > >>> Correctly Classified Instances          :      11286       99.0869%
> > > >>> Incorrectly Classified Instances        :        104        0.9131%
> > > >>> Total Classified Instances              :      11390
> > > >>>
> > > >>> =======================================================
> > > >>> Confusion Matrix
> > > >>> -------------------------------------------------------
> > > >>> a       b       c       d       e       f       g       h       i
> > > >>> j       k       l       m       n       o       p       q       r
>       s
> > > >>>    t        <--Classified as
> > > >>> 474     0       0       0       0       0       0       0       0
> > > >>> 0       0       0       0       0       0       0       0       0
>       2
> > > >>>    1         |  477         a     = alt.atheism
> > > >>> 0       566     0       2       0       1       0       0       0
> > > >>> 0       0       0       0       0       0       0       0       0
>       0
> > > >>>    0         |  569         b     = comp.graphics
> > > >>> 0       10      590     29      2       4       1       0       0
> > > >>> 0       0       0       1       0       0       0       0       0
>       0
> > > >>>    1         |  638         c     = comp.os.ms-windows.misc
> > > >>> 0       0       0       596     0       0       0       0       0
> > > >>> 0       0       0       0       0       0       0       0       0
>       0
> > > >>>    0         |  596         d     = comp.sys.ibm.pc.hardware
> > > >>> 0       0       0       0       575     0       1       0       0
> > > >>> 0       0       0       1       0       0       0       0       0
>       0
> > > >>>    0         |  577         e     = comp.sys.mac.hardware
> > > >>> 0       2       2       2       0       593     1       0       0
> > > >>> 0       0       0       0       0       1       0       0       0
>       0
> > > >>>    0         |  601         f     = comp.windows.x
> > > >>> 0       0       0       1       0       0       589     1       0
> > > >>> 0       1       0       2       0       0       0       0       0
>       0
> > > >>>    0         |  594         g     = misc.forsale
> > > >>> 0       0       0       0       0       0       0       594     0
> > > >>> 0       0       0       0       0       0       0       0       0
>       0
> > > >>>    0         |  594         h     = rec.autos
> > > >>> 0       0       0       0       0       0       0       0       611
> > > >>> 0       0       0       0       0       0       0       0       0
>       0
> > > >>>    0         |  611         i     = rec.motorcycles
> > > >>> 0       0       0       0       0       0       0       0       0
> > > >>> 616     1       0       0       0       0       0       0       0
>       0
> > > >>>    0         |  617         j     = rec.sport.baseball
> > > >>> 0       0       0       0       0       0       1       0       0
> > > >>> 0       620     0       0       0       0       0       0       0
>       0
> > > >>>    0         |  621         k     = rec.sport.hockey
> > > >>> 0       0       0       0       0       0       0       0       0
> > > >>> 0       0       580     0       0       0       0       0       1
>       0
> > > >>>    0         |  581         l     = sci.crypt
> > > >>> 0       0       0       3       1       0       0       0       0
> > > >>> 0       0       0       571     0       0       0       0       0
>       0
> > > >>>    0         |  575         m     = sci.electronics
> > > >>> 0       0       0       0       0       0       0       0       0
> > > >>> 0       0       0       2       583     0       0       0       0
>       0
> > > >>>    0         |  585         n     = sci.med
> > > >>> 0       0       0       0       0       0       0       0       0
> > > >>> 0       0       0       0       1       599     0       0       0
>       0
> > > >>>    0         |  600         o     = sci.space
> > > >>> 0       1       0       0       0       0       0       0       0
> > > >>> 0       0       0       0       0       0       615     0       0
>       0
> > > >>>    0         |  616         p     = soc.religion.christian
> > > >>> 1       0       0       0       0       0       0       0       0
> > > >>> 0       0       0       0       0       0       1       560     0
>       0
> > > >>>    0         |  562         q     = talk.politics.mideast
> > > >>> 0       0       1       0       0       0       0       0       0
> > > >>> 0       0       1       0       0       0       0       0
> 548     0
> > > >>>    1         |  551         r     = talk.politics.guns
> > > >>> 10      0       0       0       0       0       0       0       0
> > > >>> 0       0       0       0       0       1       1       0       2
>       344
> > > >>>    1         |  359         s     = talk.religion.misc
> > > >>> 0       0       0       0       0       0       0       0       0
> > > >>> 0       0       1       1       0       0       0       0       2
>       0
> > > >>>    462       |  466         t     = talk.politics.misc
> > > >>>
> > > >>> =======================================================
> > > >>> Statistics
> > > >>> -------------------------------------------------------
> > > >>> Kappa                                       0.9847
> > > >>>  Accuracy                                   99.0869%
> > > >>> Reliability                                94.3334%
> > > >>> Reliability (standard deviation)            0.2169
> > > >>>
> > > >>> Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
> > > >>> INFO: Program took 14304 ms (Minutes: 0.2384)
> > > >>> + echo 'Testing on holdout set'
> > > >>> Testing on holdout set
> > > >>>
> > > >>> [snip]
> > > >>>
> > > >>> INFO: Standard NB Results:
> > > >>> =======================================================
> > > >>> Summary
> > > >>> -------------------------------------------------------
> > > >>> Correctly Classified Instances          :       6718       90.1019%
> > > >>> Incorrectly Classified Instances        :        738        9.8981%
> > > >>> Total Classified Instances              :       7456
> > > >>>
> > > >>> =======================================================
> > > >>> Confusion Matrix
> > > >>> -------------------------------------------------------
> > > >>> a       b       c       d       e       f       g       h       i
> > > >>> j       k       l       m       n       o       p       q       r
>       s
> > > >>>    t        <--Classified as
> > > >>> 294     0       0       0       0       0       0       0       0
> > > >>> 0       0       2       0       1       1       6       1       1
>       16
> > > >>>     0         |  322         a     = alt.atheism
> > > >>> 0       345     6       14      6       11      6       0       0
> > > >>> 0       0       5       7       1       3       0       0       0
>       0
> > > >>>    0         |  404         b     = comp.graphics
> > > >>> 2       29      177     78      22      19      9       1       0
> > > >>> 0       0       4       2       0       1       1       0       0
>       1
> > > >>>    1         |  347         c     = comp.os.ms-windows.misc
> > > >>> 1       9       2       335     18      2       10      0       0
> > > >>> 0       1       0       8       0       0       0       0       0
>       0
> > > >>>    0         |  386         d     = comp.sys.ibm.pc.hardware
> > > >>> 1       4       2       13      347     3       5       1       0
> > > >>> 0       1       0       7       1       0       0       0       1
>       0
> > > >>>    0         |  386         e     = comp.sys.mac.hardware
> > > >>> 0       20      0       4       0       352     4       0       0
> > > >>> 0       0       0       1       1       3       0       1       0
>       1
> > > >>>    0         |  387         f     = comp.windows.x
> > > >>> 0       2       0       21      5       1       323     7       2
> > > >>> 2       0       2       12      0       3       0       0       0
>       0
> > > >>>    1         |  381         g     = misc.forsale
> > > >>> 0       1       0       0       1       0       15      363     8
> > > >>> 1       0       0       4       1       0       0       0       1
>       0
> > > >>>    1         |  396         h     = rec.autos
> > > >>> 0       1       0       0       0       0       6       6       370
> > > >>> 0       0       0       0       1       0       0       0       0
>       1
> > > >>>    0         |  385         i     = rec.motorcycles
> > > >>> 1       0       0       1       1       0       2       1       2
> > > >>> 362     5       0       2       0       0       0       0       0
>       0
> > > >>>    0         |  377         j     = rec.sport.baseball
> > > >>> 0       0       0       1       2       0       0       0       0
> > > >>> 3       371     0       0       0       0       0       0       0
>       0
> > > >>>    1         |  378         k     = rec.sport.hockey
> > > >>> 0       3       1       0       1       0       2       0       0
> > > >>> 0       0       396     0       1       0       0       1       1
>       1
> > > >>>    3         |  410         l     = sci.crypt
> > > >>> 0       7       0       7       7       2       6       4       0
> > > >>> 0       0       1       369     2       2       0       0       0
>       0
> > > >>>    2         |  409         m     = sci.electronics
> > > >>> 0       3       0       2       1       0       2       0       0
> > > >>> 0       0       1       4       383     4       0       0       1
>       0
> > > >>>    4         |  405         n     = sci.med
> > > >>> 0       5       0       0       1       0       3       0       0
> > > >>> 0       0       0       1       0       374     1       0       0
>       1
> > > >>>    1         |  387         o     = sci.space
> > > >>> 6       2       0       1       1       0       0       1       0
> > > >>> 1       0       0       1       5       0       352     2       1
>       7
> > > >>>    1         |  381         p     = soc.religion.christian
> > > >>> 1       1       0       0       0       0       0       0       0
> > > >>> 0       1       0       0       0       0       0       373     1
>       0
> > > >>>    1         |  378         q     = talk.politics.mideast
> > > >>> 0       0       0       0       0       0       1       0       1
> > > >>> 0       0       2       0       0       0       0       0
> 346     2
> > > >>>    7         |  359         r     = talk.politics.guns
> > > >>> 26      1       0       1       0       0       0       2       0
> > > >>> 1       1       0       0       1       1       20      2       6
>       200
> > > >>>    7         |  269         s     = talk.religion.misc
> > > >>> 1       0       0       0       0       0       0       2       0
> > > >>> 0       1       0       0       2       2       0       1       14
>      0
> > > >>>    286       |  309         t     = talk.politics.misc
> > > >>>
> > > >>> =======================================================
> > > >>> Statistics
> > > >>> -------------------------------------------------------
> > > >>> Kappa                                       0.8726
> > > >>> Accuracy                                   90.1019%
> > > >>> Reliability                                85.4491%
> > > >>> Reliability (standard deviation)            0.2222
> > > >>>
> > > >>> Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
> > > >>> INFO: Program took 10878 ms (Minutes: 0.1813)
> > > >>>
> > > >>> *SGD:*
> > > >>> 7532 test files
> > > >>>
> > > >>> =======================================================
> > > >>> Summary
> > > >>> -------------------------------------------------------
> > > >>> Correctly Classified Instances          :       5649            75%
> > > >>> Incorrectly Classified Instances        :       1883            25%
> > > >>> Total Classified Instances              :       7532
> > > >>>
> > > >>> =======================================================
> > > >>> Confusion Matrix
> > > >>> -------------------------------------------------------
> > > >>> a       b       c       d       e       f       g       h       i
> > > >>> j       k       l       m       n       o       p       q       r
>       s
> > > >>>    t        <--Classified as
> > > >>> 186     6       3       10      5       0       33      4       13
> > > >>>  15      7       1       24      15      3       15      5       5
>       29
> > > >>>     15        |  394         a     = sci.space
> > > >>> 5       309     0       3       2       5       0       0       0
> > > >>> 1       9       21      2       0       0       18      4       4
>       1
> > > >>>    1         |  385         b     = comp.sys.mac.hardware
> > > >>> 4       1       101     3       0       1       63      0       7
> > > >>> 0       1       1       5       16      3       0       3       7
>       1
> > > >>>    34        |  251         c     = talk.religion.misc
> > > >>> 11      12      1       265     1       10      3       0       0
> > > >>> 17      10      11      5       2       0       11      3       6
>       21
> > > >>>     0         |  389         d     = comp.graphics
> > > >>> 2       1       1       0       349     2       3       0       3
> > > >>> 2       6       1       5       1       0       2       15      2
>       1
> > > >>>    2         |  398         e     = rec.motorcycles
> > > >>> 7       20      3       19      2       254     6       0       2
> > > >>> 11      2       39      7       2       0       4       2       2
>       9
> > > >>>    3         |  394         f     = comp.os.ms-windows.misc
> > > >>> 2       1       13      0       0       0       247     0       1
> > > >>> 1       3       0       6       2       4       0       2       3
>       5
> > > >>>    29        |  319         g     = alt.atheism
> > > >>> 1       1       0       0       2       0       2       361     0
> > > >>> 1       2       0       2       0       0       1       3       22
>      0
> > > >>>    1         |  399         h     = rec.sport.hockey
> > > >>> 3       0       3       1       0       0       5       0       161
> > > >>> 0       1       2       12      102     0       0       1       2
>       11
> > > >>>     6         |  310         i     = talk.politics.misc
> > > >>> 2       8       0       19      0       19      0       0       1
> > > >>> 294     10      11      4       2       0       5       0       3
>       11
> > > >>>     6         |  395         j     = comp.windows.x
> > > >>> 2       10      0       1       1       0       0       0       0
> > > >>> 1       347     13      2       1       0       5       3       2
>       2
> > > >>>    0         |  390         k     = misc.forsale
> > > >>> 1       36      0       6       1       25      0       0       1
> > > >>> 6       10      257     2       1       0       34      6       0
>       6
> > > >>>    0         |  392         l     = comp.sys.ibm.pc.hardware
> > > >>> 2       2       2       2       1       0       12      0       0
> > > >>> 6       10      4       312     5       2       13      11      3
>       3
> > > >>>    6         |  396         m     = sci.med
> > > >>> 2       0       3       2       1       0       0       1       13
> > > >>>  0       5       1       2       314     2       0       2       2
>       10
> > > >>>     4         |  364         n     = talk.politics.guns
> > > >>> 1       0       2       1       1       0       34      1       33
> > > >>>  1       3       0       1       8       271     1       4       5
>       6
> > > >>>      3         |  376         o     = talk.politics.mideast
> > > >>> 3       14      0       8       2       8       3       1       1
> > > >>> 7       12      29      6       2       1       245     13      2
>       32
> > > >>>     4         |  393         p     = sci.electronics
> > > >>> 3       3       0       2       11      0       1       0       2
> > > >>> 1       11      6       4       2       0       11      330     4
>       4
> > > >>>    1         |  396         q     = rec.autos
> > > >>> 0       0       1       0       1       0       4       12      3
> > > >>> 1       3       0       0       0       0       5       6
> 359     1
> > > >>>    1         |  397         r     = rec.sport.baseball
> > > >>> 0       1       0       0       0       1       0       0       3
> > > >>> 3       0       0       3       2       1       6       1       6
>       366
> > > >>>    3         |  396         s     = sci.crypt
> > > >>> 0       2       11      1       1       0       40      0       1
> > > >>> 2       3       4       2       1       0       5       0       2
>       2
> > > >>>    321       |  398         t     = soc.religion.christian
> > > >>>
> > > >>> =======================================================
> > > >>> Statistics
> > > >>> -------------------------------------------------------
> > > >>> Kappa                                       0.7073
> > > >>> Accuracy                                        75%
> > > >>> Reliability                                70.6238%
> > > >>> Reliability (standard deviation)            0.2187
> > > >>> Log-likelihood                mean      :    -1.1182
> > > >>>                               25%-ile   :    -1.6911
> > > >>>                               75%-ile   :    -0.0803
> > > >>>
> > > >>> Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
> > > >>> INFO: Program took 10783 ms (Minutes: 0.17971666666666666)
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <
> suneel_marthi@yahoo.com>wrote:
> > > >>>
> > > >>>> Thanks Andrew for reporting that. I rolled back the release to
> fix this
> > > >>>> and few other issues.
> > > >>>>
> > > >>>> We have removed asf-examples*.sh from trunk as the sample file at
> the
> > > >>>> url mentioned in ur email is not available.
> > > >>>> This is something we need to fix and restore in 1.0.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <
> > > >>>> ap.dev@outlook.com> wrote:
> > > >>>>
> > > >>>> from the asf-email-examples.sh script:
> > > >>>>
> > > >>>> # You will need to download or otherwise obtain some or all of the
> > > >>>> Amazon ASF Em
> > > >>>> ail Public Dataset (
> http://aws.amazon.com/datasets/7791434387204566)
> > > >>>> to use this
> > > >>>> script.
> > > >>>> # To obtain a full copy you will need to launch an EC2 instance
> and
> > > >>>> mount the da
> > > >>>> taset to download it, otherwise you can get a sample of it at
> > > >>>> #
> > > >>>>
> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
> > > >>>>
> > > >>>> It looks like the:
> > > >>>>
> > > >>>>
> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
> > > >>>>
> > > >>>> link is down.
> > > >>>>
> > > >>>> Is there somewhere else that we can get a subset of the ASF
> emails?
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Date: Tue, 21 Jan 2014 09:48:06 -0800
> > > >>>> > Subject: Re: MAHOUT 0.9 Release - New URL
> > > >>>> > From: andrew.musselman@gmail.com
> > > >>>> > To: dev@mahout.apache.org
> > > >>>> >
> > > >>>> > Sure thing; continuing to smoke test the other examples tonight
> > > >>>> >
> > > >>>> >
> > > >>>> > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <
> > > >>>> suneel_marthi@yahoo.com>wrote:
> > > >>>> >
> > > >>>> > > Thanks Andrew M., see that some of the example scripts need
> to be
> > > >>>> fixed as
> > > >>>> > > they still refer to the deprecated algorithms.
> > > >>>> > > See that the Streaming KMeans has failed for you as well.
> > > >>>> > >
> > > >>>> > > I'll be rolling back the release today to fix these issues.
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
> > > >>>> > > andrew.musselman@gmail.com> wrote:
> > > >>>> > >
> > > >>>> > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's
> default
> > > >>>> 64-bit
> > > >>>> > > Linux AMI from tarball.
> > > >>>> > >
> > > >>>> > > All tests pass.
> > > >>>> > >
> > > >>>> > > *Output of examples:*
> > > >>>> > > *asf-email-examples.sh, run on mahout.apache.org
> > > >>>> > > <http://mahout.apache.org>:*
> > > >>>> > > *recommendations:*
> > > >>>> > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
> > > >>>> > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000
>  | less
> > > >>>> > > 1
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
> > > >>>> > > 4
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
> > > >>>> > > 6
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
> > > >>>> > > 8
> > > >>>> > >     [12758:1.0,19409:1.0,11112:1.0]
> > > >>>> > > 11
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
> > > >>>> > > 14
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
> > > >>>> > > 15
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
> > > >>>> > > 16
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
> > > >>>> > > 18
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
> > > >>>> > > 19
>  [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
> > > >>>> > > 20
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
> > > >>>> > > [snip]
> > > >>>> > >
> > > >>>> > > *clustering; kmeans:*
> > > >>>> > > [snip]
> > > >>>> > >         Weight : [props - optional]:  Point:
> > > >>>> > >         1.0 :
> > > >>>> > >  [distance-squared=1.0193102046188427]:
> > > >>>> > >
> /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus=
> > > >>>> > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
> > > >>>> 7573:0.204,
> > > >>>> > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093,
> > > >>>> 9779:0.159,
> > > >>>> > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065,
> 17007:0.244,
> > > >>>> > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140,
> 24649:0.095,
> > > >>>> > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156,
> 31559:0.075,
> > > >>>> > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075,
> 38378:0.130,
> > > >>>> > > 39789:0.110, 40743:0.190, 45775:0.086]
> > > >>>> > >         1.0 : [distance-squared=0.9823018320457279]:
> > > >>>> > >
> /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus=
> > > >>>> > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
> > > >>>> 5336:0.106,
> > > >>>> > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173,
> > > >>>> 7832:0.072,
> > > >>>> > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094,
> 19359:0.177,
> > > >>>> > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098,
> 24649:0.092,
> > > >>>> > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150,
> 30459:0.072,
> > > >>>> > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113,
> 36491:0.073,
> > > >>>> > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106,
> 45775:0.083]
> > > >>>> > >         1.0 : [distance-squared=0.9509142993214911]:
> > > >>>> > >
> > > >>>>
> /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
> > > >>>> > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
> > > >>>> > >  4419:0.076,
> > > >>>> > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056,
> > > >>>> 7235:0.048,
> > > >>>> > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123,
> > > >>>> 7683:0.077,
> > > >>>> > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
> > > >>>> 10225:0.081,
> > > >>>> > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139,
> 11663:0.087,
> > > >>>> > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051,
> 14352:0.061,
> > > >>>> > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149,
> 19774:0.124,
> > > >>>> > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078,
> 23974:0.105,
> > > >>>> > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151,
> 25128:0.052,
> > > >>>> > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046,
> 31727:0.104,
> > > >>>> > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069,
> 33112:0.177,
> > > >>>> > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042,
> 35795:0.066,
> > > >>>> > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200,
> 37111:0.071,
> > > >>>> > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079,
> 41155:0.167,
> > > >>>> > > 41280:0.065, 41696:0.072, 41947:0.118,
> > > >>>> > >  43685:0.086, 44077:0.308,
> > > >>>> > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052,
> 46766:0.074,
> > > >>>> > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
> > > >>>> > > [snip]
> > > >>>> > >
> > > >>>> > > *clustering; dirichlet:*
> > > >>>> > > Get this complaint:
> > > >>>> > > Running Dirichlet with K = 8
> > > >>>> > > Running on hadoop, using
> /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > > >>>> > > HADOOP_CONF_DIR=
> > > >>>> > > MAHOUT-JOB:
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > > >>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add
> class:
> > > >>>> dirichlet
> > > >>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props
> > > >>>> found on
> > > >>>> > > classpath, will use command-line arguments only
> > > >>>> > > Unknown program 'dirichlet' chosen.
> > > >>>> > >
> > > >>>> > > *clustering: minhash:*
> > > >>>> > > Running Minhash
> > > >>>> > > Running on hadoop, using
> /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > > >>>> > > HADOOP_CONF_DIR=
> > > >>>> > > MAHOUT-JOB:
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > > >>>> > > 14/01/21 05:17:27 WARN
> > > >>>> > >  driver.MahoutDriver: Unable to add class: minhash
> > > >>>> > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props
> found
> > > >>>> on
> > > >>>> > > classpath, will use command-line arguments only
> > > >>>> > > Unknown program 'minhash' chosen.
> > > >>>> > >
> > > >>>> > > *classification; standard:*
> > > >>>> > > =======================================================
> > > >>>> > > Summary
> > > >>>> > > -------------------------------------------------------
> > > >>>> > > Correctly Classified Instances          :       5384
> 87.7874%
> > > >>>> > > Incorrectly Classified Instances        :        749
> 12.2126%
> > > >>>> > > Total Classified Instances              :       6133
> > > >>>> > >
> > > >>>> > > =======================================================
> > > >>>> > > Confusion Matrix
> > > >>>> > > -------------------------------------------------------
> > > >>>> > > a       b       c       d
> > > >>>> > >     <--Classified as
> > > >>>> > > 2949    7       531     25       |  3512        a     = dev
> > > >>>> > > 0       0       0       0        |  0           b     =
> general
> > > >>>> > > 99      8       1763    8        |  1878        c     = user
> > > >>>> > > 41      1       29      672      |  743         d     =
> commits
> > > >>>> > >
> > > >>>> > > =======================================================
> > > >>>> > > Statistics
> > > >>>> > > -------------------------------------------------------
> > > >>>> > > Kappa
> > > >>>> > >  0.7877
> > > >>>> > > Accuracy                                   87.7874%
> > > >>>> > > Reliability                                 53.658%
> > > >>>> > > Reliability (standard deviation)            0.4911
> > > >>>> > >
> > > >>>> > > *classification; complementary:*
> > > >>>> > > =======================================================
> > > >>>> > > Summary
> > > >>>> > > -------------------------------------------------------
> > > >>>> > > Correctly Classified Instances          :       5530
> 90.1679%
> > > >>>> > > Incorrectly Classified Instances        :        603
>  9.8321%
> > > >>>> > > Total Classified Instances              :
> > > >>>> > >  6133
> > > >>>> > >
> > > >>>> > > =======================================================
> > > >>>> > > Confusion Matrix
> > > >>>> > > -------------------------------------------------------
> > > >>>> > > a       b       c       d       <--Classified as
> > > >>>> > > 3168    0       276     68       |  3512        a     = dev
> > > >>>> > > 0       0       0       0        |  0           b     =
> general
> > > >>>> > > 196     0       1652    30       |  1878        c     = user
> > > >>>> > > 25      0       8       710      |  743         d     =
> > > >>>> > >  commits
> > > >>>> > >
> > > >>>> > > =======================================================
> > > >>>> > > Statistics
> > > >>>> > > -------------------------------------------------------
> > > >>>> > > Kappa                                       0.8259
> > > >>>> > > Accuracy                                   90.1679%
> > > >>>> > > Reliability                                54.7459%
> > > >>>> > > Reliability (standard deviation)            0.5005
> > > >>>> > >
> > > >>>> > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took
> 20901 ms
> > > >>>> (Minutes:
> > > >>>> > > 0.34836666666666666)
> > > >>>> > >
> > > >>>> > > *classification; sgd, with three categories:*
> > > >>>> > > Running SGD Training
> > > >>>> > > Running on hadoop, using
> /home/ec2-user/hadoop-1.2.1/bin/hadoop
> > > >>>> > >  and
> > > >>>> > > HADOOP_CONF_DIR=
> > > >>>> > > MAHOUT-JOB:
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > > >>>> > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
> > > >>>> > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
> > > >>>> classpath,
> > > >>>> > > will use command-line arguments only
> > > >>>> > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line
> arguments:
> > > >>>> > > {--cardinality=[100000], --categories=[3],
> --endPhase=[2147483647],
> > > >>>> > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
> > > >>>> > > --output=[asf-output/classification/sgd/models],
> --poolSize=[5],
> > > >>>> > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
> > > >>>> > > 24168 training files
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>   1
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
> > > >>>> > >  2
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>   3
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>   4
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>   6
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>   8
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>   10
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00
> > > >>>> > >    0.00    0.00    0.0000000       0.0000000       12
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>   15
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>   20
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>   25
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>   30
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000
> > > >>>> > >     0.0000000       40
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>   50
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>   60
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>   70
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>   80
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>   100
> > > >>>> > > 0.000
> > > >>>> > >  0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>   120
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>   140
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>   150
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>   200
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>   250
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00
> > > >>>> > >  0.00    0.00    0.0000000       0.0000000       300
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>   400
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>   500
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>   600
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>   700
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.00    0.00    0.00    0.00    0.0000000
> > > >>>> > >  0.0000000       800
> > > >>>> > > 0.000   0.00    none
> > > >>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > > >>>> > > 1.0019413e-08   1000    -0.607  75.78   none
> > > >>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > > >>>> > > 1.0019413e-08   1200    -0.607  75.78   none
> > > >>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > > >>>> > > 1.0019413e-08   1400    -0.607  75.78   none
> > > >>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > > >>>> > > 1.0019413e-08   1500    -0.607  75.78   none
> > > >>>> > > 0.24    43686.00        17924.00        329.50
> > > >>>> > >  1.0571799e-08
> > > >>>> > > 1.0032261e-08   2000    -0.487  82.65   none
> > > >>>> > > 0.24    49753.00        21610.00        330.71  1.3770070e-08
> > > >>>> > > 1.0011902e-08   2500    -0.439  83.90   none
> > > >>>> > > 0.24    49753.00        21610.00        330.71  1.3770070e-08
> > > >>>> > > 1.0011902e-08   3000    -0.439  83.90   none
> > > >>>> > > 0.32    50635.00        28531.00        437.09  1.0551175e-08
> > > >>>> > > 1.0000001e-08   4000    -0.351  88.14   none
> > > >>>> > > 0.32    50635.00        32642.00        437.09  1.0551175e-08
> > > >>>> > > 1.0000000e-08   5000    -0.378  87.10   none
> > > >>>> > > 0.32    50635.00        36461.00        437.09
> > > >>>> > >  1.0556652e-08
> > > >>>> > > 1.0000001e-08   6000    -0.372  86.89   none
> > > >>>> > > 0.32    50635.00        37768.00        437.09  1.0576742e-08
> > > >>>> > > 1.0000001e-08   7000    -0.334  89.26   none
> > > >>>> > > 0.32    50635.00        38807.00        437.09  1.0576742e-08
> > > >>>> > > 1.0000000e-08   8000    -0.368  87.52   none
> > > >>>> > > 0.32    50635.00        44731.00        437.09  1.0576716e-08
> > > >>>> > > 1.0000000e-08   10000   -0.374  87.39   none
> > > >>>> > > 0.32    50635.00        45672.00        437.09  1.0576716e-08
> > > >>>> > > 1.0000000e-08   12000   -0.298  88.26   none
> > > >>>> > > Exception in thread "main" java.lang.IllegalStateException:
> > > >>>> > > java.lang.ArrayIndexOutOfBoundsException:
> > > >>>> > >  2
> > > >>>> > >         at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
> > > >>>> > >         at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
> > > >>>> > >         at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
> > > >>>> > >         at
> > > >>>> > >
> > > >>>>
> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
> > > >>>> > >         at
> > > >>>> > >
> > > >>>>
> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
> > > >>>> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > > >>>> Method)
> > > >>>> > >         at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > > >>>> > >
> > > >>>> > >  at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > >>>> > >         at java.lang.reflect.Method.invoke(Method.java:622)
> > > >>>> > >         at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > > >>>> > >         at
> > > >>>> > >
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > > >>>> > >         at
> > > >>>> > >
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> > > >>>> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > > >>>> Method)
> > > >>>> > >         at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > > >>>> > >         at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > >>>> > >         at java.lang.reflect.Method.invoke(Method.java:622)
> > > >>>> > >         at
> > > >>>> > >  org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> > > >>>> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
> > > >>>> > >         at
> > > >>>> > >
> org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
> > > >>>> > >         at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
> > > >>>> > >         at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
> > > >>>> > >         at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
> > > >>>> > >         at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
> > > >>>> > >         at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
> > > >>>> > >
> > > >>>> > >  at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
> > > >>>> > >         at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
> > > >>>> > >         at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
> > > >>>> > >         at
> > > >>>> > >
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > > >>>> > >         at
> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > >>>> > >         at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > > >>>> > >         at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > >>>> > >         at java.lang.Thread.run(Thread.java:701)
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
> > > >>>> > > andrew.musselman@gmail.com> wrote:
> > > >>>> > >
> > > >>>> > > > Trying out the build today
> > > >>>> > > >
> > > >>>> > > >
> > > >>>> > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
> > > >>>> suneel_marthi@yahoo.com
> > > >>>> > > >wrote:
> > > >>>> > > >
> > > >>>> > > >> This is an issue (trivial one though) that needs to be
> fixed for
> > > >>>> 0.9
> > > >>>> > > >> Release, will be rerolling the release today (in the next
> few
> > > >>>> hrs) and
> > > >>>> > > >> putting out a new release candidate in staging.
> > > >>>> > > >>
> > > >>>> > > >> Thanks for reporting this Andrew P.
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
> > > >>>> > > ap.dev@outlook.com>
> > > >>>> > > >> wrote:
> > > >>>> > > >>
> > > >>>> > > >> I ran through the tests with on a CentOS VM
> > > >>>> > >  AMD64 2 cores 4 GB RAM.  Had
> > > >>>> > > >> a bit of trouble getting the Hadoop natives to compile and
> > > >>>> therefore may
> > > >>>> > > >> have run into some problems because of the hadoop setup.
>  Ran
> > > >>>> into some
> > > >>>> > > >> problems in the example scripts.  Particularly with
> > > >>>> > > >> ./cluster-syntheticcontrol.sh ->4,5.  I will run through
> the
> > > >>>> rest of the
> > > >>>> > > >> examples when im sure I've got hadoop setup right.
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >> Apache Maven 3.1.2-SNAPSHOT
> > > >>>> > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> > > >>>> > > >> Java home: /usr/java/jdk1.6.0_45/jre
> > > >>>> > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64",
> arch:
> > > >>>> "amd64",
> > > >>>> > > >> family: "unix"
> > > >>>> > > >> $MAHOUT_LOCAL=true
> > > >>>> > > >> Hadoop 2.2.0
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >> a) Verify that u can unpack the release (tar or zip)
> ...passed
> > > >>>> (tar)
> > > >>>> > > >> [passed ]
> > > >>>> > > >>
> > > >>>> > > >> b) Verify u r able to compile the
> > > >>>> > >  distro
> > > >>>> > > >>
> > > >>>> > > >>     mvn compile- [passed with warnings]
> > > >>>> > > >>
> > > >>>> > > >>     [WARNING]  Expected all dependencies to require Scala
> > > >>>> version: 2.9.3
> > > >>>> > > >>     [WARNING]  org.apache.mahout:mahout-math-scala:0.9
> requires
> > > >>>> scala
> > > >>>> > > >> version: 2.9.3
> > > >>>> > > >>     [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1
> requires scala
> > > >>>> > > >> version: 2.9.2
> > > >>>> > > >>     [WARNING] Multiple versions of scala libraries
> detected!
> > > >>>> > > >>
> > > >>>> > > >> c)  Run through the unit tests: mvn clean test
> > > >>>> > > >>     mvn clean test [passed]
> > > >>>> > > >>
> > > >>>> > > >> d) Run the
> > > >>>> > > >>  example scripts under $MAHOUT_HOME/examples/bin.
> > > >>>> > > >> Please run through all the different options in each script
> > > >>>> > > >>
> > > >>>> > > >>     Running example scripts with $MAHOUT_LOCAL=true
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > >  ./cluster-syntheticcontrol.sh ->1 [works]
> > > >>>> > > >>     ./cluster-syntheticcontrol.sh ->2 [works]
> > > >>>> > > >>     ./cluster-syntheticcontrol.sh ->3 [works]
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >>     ./cluster-syntheticcontrol.sh ->4 [exits, throws
> exception]
> > > >>>> > > >>     [...]
> > > >>>> > > >>     WARNING: Unable to add class:
> > > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > > >>>> > > >>     java.lang.ClassNotFoundException:
> > > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > > >>>> > > >>         at
> > > >>>> > > >>  java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > > >>>> > > >>         at
> java.security.AccessController.doPrivileged(Native
> > > >>>> Method)
> > > >>>> > > >>         at
> > > >>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > > >>>> > > >>         at
> > > >>>> > >  java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > > >>>> > > >>         at
> > > >>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > > >>>> > > >>         at
> java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > > >>>> > > >>         at java.lang.Class.forName0(Native Method)
> > > >>>> > > >>         at java.lang.Class.forName(Class.java:171)
> > > >>>> > > >>         at
> > > >>>> > > >>
> > > >>>>
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > > >>>> > > >>         at
> > > >>>> > > >>
> > > >>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > > >>>> > > >>     Jan 19, 2014 7:55:31 PM
> org.slf4j.impl.JCLLoggerAdapter warn
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >>     ./cluster-syntheticcontrol.sh ->5 [exits, throws
> exception]
> > > >>>> > > >>
> > > >>>> > > >>     WARNING: Unable to add class:
> > > >>>> > > >>
> > > >>>> > >  org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > > >>>> > > >>     java.lang.ClassNotFoundException:
> > > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > > >>>> > > >>         at
> java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > > >>>> > > >>         at
> java.security.AccessController.doPrivileged(Native
> > > >>>> Method)
> > > >>>> > > >>         at
> > > >>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > > >>>> > > >>         at
> java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > > >>>> > > >>         at
> > > >>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > > >>>> > > >>         at
> java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > > >>>> > > >>         at java.lang.Class.forName0(Native Method)
> > > >>>> > > >>         at
> > > >>>> > >  java.lang.Class.forName(Class.java:171)
> > > >>>> > > >>         at
> > > >>>> > > >>
> > > >>>>
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > > >>>> > > >>         at
> > > >>>> > > >>
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > > >>>> > > >>     Jan 19, 2014 7:59:51 PM
> org.slf4j.impl.JCLLoggerAdapter warn
> > > >>>> > > >>     WARNING: No
> > > >>>> > > >>
> > > >>>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props
> found
> > > >>>> > > on
> > > >>>> > > >> classpath, will use command-line arguments only
> > > >>>> > > >>     Unknown program
> > > >>>> > > >>
>  'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
> > > >>>> chosen.
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >>     ./classify-20newsgroups.sh ->1 [works]
> > > >>>> > > >>     ./classify-20newsgroups.sh ->2 [works]
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >>     cluster-reuters.sh ->1 [works]
> > > >>>> > > >>
> > > >>>> > >  cluster-reuters.sh ->2 [works]
> > > >>>> > > >>     cluster-reuters.sh ->3 [works]
> > > >>>> > > >>
> > > >>>> > > >>     Same error as noted previosly in the thread:
> > > >>>> > > >>
> > > >>>> > > >>     cluster-reuters.sh ->4 [0 clusters]
> > > >>>> > > >>
> > > >>>> > > >>     [...]
> > > >>>> > > >>
> > > >>>> > > >>     WARNING: No qualcluster.props found on classpath, will
> use
> > > >>>> > > >> command-line arguments only
> > > >>>> > > >>     Num clusters: 0; maxDistance: 0.000000
> > > >>>> > > >>     [Dunn Index]
> > > >>>> > > >>  First: Infinity
> > > >>>> > > >>     [Davies-Bouldin Index] First: NaN
> > > >>>> > > >>     Jan 19, 2014 7:13:57 PM
> org.slf4j.impl.JCLLoggerAdapter info
> > > >>>> > > >>     INFO: Program took 669 ms (Minutes: 0.01115)
> > > >>>> > > >>     cluster,distance.mean,distance.sd
> > > >>>> > > >>
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
> > > >>>> > > >> > From: suneel_marthi@yahoo.com
> > > >>>> > > >> > Subject: MAHOUT 0.9 Release - New URL
> > > >>>> > > >> > To: user@mahout.apache.org; dev@mahout.apache.org
> > > >>>> > > >> >
> > > >>>> > > >> > Third time's a Charm!!!
> > > >>>> > > >> >
> > > >>>> > > >> >
> > > >>>> > > >> > Here's the new URL for Mahout 0.9 Release:
> > > >>>> > > >> >
> > > >>>> > > >>
> > > >>>> > >
> > > >>>>
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> > > >>>> > > >> >
> > > >>>> > > >> > For those volunteering to test this, some of the things
> to be
> > > >>>> > > verified:
> > > >>>> > > >> >
> > > >>>> > > >> > a) Verify that u can unpack the release (tar or zip)
> > > >>>> > > >> > b) Verify u r able to compile the distro
> > > >>>> > > >> > c)  Run through the unit tests: mvn clean test
> > > >>>> > > >> > d) Run the example scripts
> > > >>>> > > >>  under $MAHOUT_HOME/examples/bin. Please run through all
> the
> > > >>>> different
> > > >>>> > > >> options in each script.
> > > >>>> > > >> >
> > > >>>> > > >> >
> > > >>>> > > >> > Committers
> > > >>>> > > >> >  and PMC members:
> > > >>>> > > >> > ---------------------------------------
> > > >>>> > > >> >
> > > >>>> > > >> > Need 'at least 3 +1 votes' for the Release to pass.
> > > >>>> > > >> >
> > > >>>> > > >> >
> > > >>>> > > >> > Thanks and
> > > >>>> > >  Regards.
> > > >>>> > > >>
> > > >>>> > > >
> > > >>>> > > >
> > > >>>> > >
> > > >>>>
> > > >>>
> > > >>>
> > > >>
> > > >
> >
>

Re: MAHOUT 0.9 Release - New URL

Posted by Suneel Marthi <su...@yahoo.com>.
Thanks Andrew. I'll put a Release out soon. 




On Wednesday, January 22, 2014 3:52 PM, Andrew Palumbo <ap...@outlook.com> wrote:
 
 
Everything seems to run well on my local machine:

Checked out revision 1560364.

CentOS 6
Apache Maven 3.1.2-SNAPSHOT 
Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
Java home: /usr/java/jdk1.6.0_45/jre
OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64", family: "unix"
Hadoop 2.2.0


mvn clean compile -DSkipTests [OK-Several Warnings]
mvn clean test [PASSED ALL]
mvn clean install -DskipTests [OK]


$MAHOUT_LOCAL=true 

classify-20newsgroups.sh->1 [Accuracy    89.3529%]
classify-20newsgroups.sh->2 [Accuracy    90.8317%]
classify-20newsgroups.sh->3 [Accuracy    76.2746%]
classify-20newsgroups.sh->4 [cleans up] 

cluster-reuters.sh->1 [20 clusters]  -kmeans
cluster-reuters.sh->2 [INFO: 20 clusters]  -fkmeans
cluster-reuters.sh->3 [OK]  -lda
cluster-reuters.sh->4 [10 (9) clusters- see attached]  -streaming kmeans

./cluster-syntheticcontrol.sh->1 [INFO: Wrote 6 clusters]
./cluster-syntheticcontrol.sh->2 [INFO: Wrote 6 clusters]
./cluster-syntheticcontrol.sh->3 [INFO: Wrote 6 clusters]

./factorize-movielens-1M.sh /home/andy/test_data/ml-1m/ratings.dat [RMSE is: 0.851264570339848]




Attached is full output of cluster-reuters.sh->4 Streaming K-Means.



From cluster-reuters.sh->4 Streaming K-Means:

Cluster 0 is has 1 data point. Need atleast 2 data points in a cluster for OnlineSummarizer.
Average distance in cluster 1 [2816]: 3438.913758
Average distance in cluster 2 [112]: 20617.345993
Average distance in cluster 3 [4]: 32504.085379
Average distance in cluster 4 [435]: 18476.579935
Average distance in cluster 5 [27]: 21153.167574
Average distance in cluster 6 [15480]: 2040.864416
Average distance in cluster 7 [1711]: 5281.742482
Average distance in cluster 8 [964]: 15762.976239
Average distance in cluster 9 [28]: 19762.109632
Num clusters: 10; maxDistance: 107106.379648




[Dunn Index] First: 0.002272
[Davies-Bouldin Index] First: 57.871266
Jan 22, 2014 12:14:47 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 77695 ms (Minutes: 1.2949166666666667)
cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
1,3438.913758,2430.072640,250.635051,1793.254765,2908.356638,4444.702564,22173.892767,2816,train
2,20617.345993,3978.577827,-8306.835555,17787.685767,19584.319120,22864.637511,37305.829397,112,train
3,32504.085379,29250.558538,-12174.296092,12174.296092,36522.888276,26372.137172,107106.379648,4,train
4,18476.579935,3600.742072,-7212.729374,15841.995992,17431.838259,20066.610494,40205.090209,435,train
5,21153.167574,4963.661797,-8880.583978,19729.348269,21251.400944,24588.743549,27926.248558,27,train
6,2040.864416,2007.719699,53.622493,841.033934,1571.121917,2396.407672,18967.768820,15480,train
7,5281.742482,3083.071478,1933.759989,3216.929268,4074.689928,6371.577109,20292.193673,1711,train
8,15762.976239,3158.956443,65.031208,13511.867700,14744.029626,17287.006957,31483.809655,964,train
9,19762.109632,4355.120345,-8902.814641,18669.317253,20712.227220,21602.660490,27452.910312,28,train





> From: ap.dev@outlook.com
> To: dev@mahout.apache.org; user@mahout.apache.org
> Subject: RE: MAHOUT 0.9 Release - New URL
> Date: Wed, 22 Jan 2014 09:37:06 -0500
> 
> will do!
> 
> > Date: Wed, 22 Jan 2014 01:24:05 -0800
> > From: suneel_marthi@yahoo.com
> > Subject: Re: MAHOUT 0.9 Release - New URL
> > To: dev@mahout.apache.org; user@mahout.apache.org
> > 
> > Andrew M., Andrew P. and others,
> > 
> > Sebastian and me fixed a few issues today (for 0.9):
> > 
> > a) Removed asf-email-examples.sh script and few other scripts that should have been removed. Also removed references/invocations to algorithms that have been removed from the codebase.
> > b) Fixed the issue with Streaming Kmeans clustering and checked in the code. 
> > c) Resurrected Frequent Pattern Mining implementation for 0.9.
> > 
> > Please checkout the latest code from trunk, run a build locally and run thru the example scripts. 
> > 
> > Thanks and Regards.
> > 
> > 
> > 
> > 
> > 
> > 
> > On Wednesday, January 22, 2014 12:11 AM, Andrew Musselman <an...@gmail.com> wrote:
> > 
> > *factorize-movielens-1M.sh:*
> > RMSE is:
> > 
> > 0.8519064098265133
> > 
> > 
> > Sample recommendations:
> > 
> > 2229
> > [2197:4.4961276,527:4.4309845,2972:4.4180074,318:4.379484,572:4.312696,3092:4.249903]
> > 5848
> > [1900:4.6775646,3787:4.6623707,632:4.641377,2609:4.608225,3808:4.6058936,2998:4.6057487]
> > 3728
> > [572:4.951382,47:4.748921,874:4.6945343,1563:4.679901,3314:4.6621537,50:4.655838]
> > 1252
> > [53:5.0,3816:4.9664702,3077:4.9494777,213:4.94007,3808:4.9060082,978:4.8568053]
> > 634
> > [572:5.0,3092:4.779557,1872:4.72024,2687:4.629712,2125:4.615142,3853:4.5261393]
> > 5516    [572:5.0,2197:5.0,3092:5.0,318:4.908213,356:4.885,3844:4.8237453]
> > 2276    [1204:5.0,572:5.0,912:5.0,1250:5.0,1272:4.999891,1262:4.989652]
> > 4219
> > [53:4.8112006,598:4.775032,858:4.761604,572:4.7579737,1219:4.680987,1221:4.6604886]
> > 91      [1198:5.0,2762:5.0,1207:5.0,1234:5.0,318:5.0,260:5.0]
> > 502
> > [953:5.0,260:4.9800477,1234:4.869403,1198:4.8527064,1207:4.8497486,3469:4.847286]
> > 
> > factorize-netflix.sh:
> > References a no-longer-available data set that Netflix took down after the
> > competition; should at least mention that the data set is no longer
> > "online" at least.
> > 
> > 
> > On Tue, Jan 21, 2014 at 8:05 PM, Andrew Musselman <
> > andrew.musselman@gmail.com> wrote:
> > 
> > > *clustering-syntheticcontrol.sh*
> > >
> > > *Canopy:*
> > > [snip]
> > >         1.0 : [distance-squared=1740.681000315628]: [35.486, 25.600,
> > > 29.914, 30.200, 27.654, 28.647, 26.582, 32.941, 34.728, 29.047, 34.706,
> > > 34.764, 28.816, 30.271, 25.784, 25.035, 35.436, 29.075, 24.267, 24.625,
> > > 28.162, 26.218, 28.773, 28.966, 28.802, 34.292, 26.211, 33.363, 32.920,
> > > 31.891, 34.504, 32.686, 24.327, 35.981, 31.390, 10.832, 20.238, 10.051,
> > > 14.877, 10.570, 19.603, 14.544, 10.667, 16.470, 19.007, 10.352, 13.473,
> > > 12.196, 10.684, 16.620, 20.434, 17.069, 18.744, 9.599, 11.195, 12.002,
> > > 10.017, 17.149, 14.850, 10.890]
> > >         1.0 : [distance-squared=1455.363773097357]: [31.022, 28.140,
> > > 26.730, 26.570, 29.561, 26.966, 28.049, 25.673, 33.721, 26.275, 30.410,
> > > 31.101, 24.019, 35.659, 25.253, 25.932, 28.618, 32.423, 33.666, 33.745,
> > > 35.118, 29.164, 25.477, 31.947, 35.491, 30.730, 25.820, 24.651, 25.528,
> > > 31.343, 29.005, 31.825, 26.891, 28.194, 31.429, 16.935, 8.070, 16.604,
> > > 14.743, 10.342, 8.155, 10.395, 17.689, 16.791, 14.138, 15.761, 6.787,
> > > 13.062, 16.660, 15.021, 9.891, 9.216, 11.550, 8.877, 18.220, 9.477, 10.342,
> > > 16.430, 11.898, 15.366]
> > >         1.0 : [distance-squared=1679.9304895378882]: [29.625, 25.503,
> > > 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> > > 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> > > 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> > > 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> > > 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> > > 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> > > 15.285, 22.528, 20.657, 24.129]
> > >         1.0 : [distance-squared=2044.2887801683828]: [27.414, 25.397,
> > > 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> > > 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> > > 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> > > 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> > > 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> > > 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> > > 20.229, 11.131, 9.980, 10.720]
> > >         1.0 : [distance-squared=1385.3154063160764]: [35.899, 26.672,
> > > 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> > > 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> > > 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> > > 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> > > 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> > > 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> > > 16.546, 15.927, 18.084, 17.475]
> > >         1.0 : [distance-squared=1920.6376615603585]: [24.538, 24.280,
> > > 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> > > 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> > > 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> > > 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> > > 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> > > 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> > > 19.310, 12.999, 17.460]
> > >         1.0 : [distance-squared=2192.939571172661]: [34.335, 30.938,
> > > 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> > > 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> > > 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> > > 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> > > 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> > > 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> > > 11.743, 11.699, 10.152]
> > > Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Wrote 6 clusters
> > > Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Program took 5827 ms (Minutes: 0.09711666666666667)
> > >
> > > *K-means:*
> > > [snip]
> > >         1.0 : [distance-squared=2873.881301031739]: [26.369, 37.791,
> > > 41.839, 39.694, 36.728, 35.079, 30.668, 24.755, 20.610, 18.885, 15.459,
> > > 14.074, 19.117, 34.230, 32.202, 40.715, 39.543, 37.193, 39.448, 30.829,
> > > 25.273, 22.324, 19.801, 13.505, 17.462, 24.287, 27.398, 38.577, 42.108,
> > > 41.947, 43.987, 41.331, 28.958, 23.664, 20.308, 22.244, 12.149, 15.768,
> > > 18.964, 32.579, 33.202, 36.205, 42.364, 40.601, 35.415, 39.576, 33.145,
> > > 19.936, 23.062, 19.053, 24.383, 19.611, 25.218, 38.860, 36.570, 38.964,
> > > 36.946, 36.900, 32.593, 31.563]
> > >         1.0 : [distance-squared=2525.8924241648783]: [35.389, 31.178,
> > > 40.041, 43.034, 49.524, 40.942, 42.369, 30.153, 26.253, 23.178, 19.885,
> > > 19.693, 21.837, 26.858, 23.533, 29.798, 43.401, 43.349, 48.238, 43.868,
> > > 44.017, 28.056, 25.284, 22.737, 13.703, 14.023, 22.115, 23.720, 25.478,
> > > 37.870, 37.868, 46.954, 43.135, 41.286, 37.269, 29.347, 24.312, 21.743,
> > > 19.152, 21.668, 10.882, 16.147, 30.020, 28.472, 38.716, 44.620, 47.847,
> > > 48.846, 39.361, 38.449, 33.105, 19.935, 14.961, 12.149, 12.630, 13.459,
> > > 18.967, 35.473, 30.146, 45.527]
> > >         1.0 : [distance-squared=2392.7171990886272]: [27.662, 37.199,
> > > 39.158, 44.264, 46.473, 40.178, 38.728, 24.412, 25.155, 15.938, 13.125,
> > > 16.867, 13.875, 29.969, 34.322, 40.870, 44.225, 47.594, 47.607, 44.751,
> > > 39.813, 32.461, 16.646, 16.305, 21.256, 20.627, 19.267, 31.901, 34.995,
> > > 43.643, 43.152, 47.125, 48.524, 33.131, 32.697, 20.300, 14.350, 18.330,
> > > 12.155, 11.261, 19.701, 29.313, 30.457, 39.914, 49.543, 49.851, 46.445,
> > > 39.561, 28.860, 22.486, 13.967, 10.006, 11.715, 15.945, 21.348, 25.439,
> > > 41.024, 37.105, 45.623, 43.589]
> > >         1.0 : [distance-squared=1419.8378244373016]: [25.784, 34.129,
> > > 42.659, 37.176, 35.961, 34.307, 32.108, 29.749, 25.047, 17.455, 24.652,
> > > 25.311, 22.995, 30.256, 25.955, 28.426, 34.556, 40.386, 39.642, 40.566,
> > > 32.612, 34.091, 26.033, 25.668, 26.545, 17.338, 24.980, 20.134, 27.542,
> > > 36.612, 31.855, 37.947, 39.736, 33.535, 36.607, 37.479, 32.612, 22.262,
> > > 20.662, 16.124, 24.547, 27.686, 21.747, 27.198, 31.259, 40.569, 37.067,
> > > 34.465, 34.730, 33.371, 23.060, 30.162, 22.022, 22.216, 14.812, 19.357,
> > > 24.508, 34.432, 32.155, 34.839]
> > >         1.0 : [distance-squared=4186.814512311335]: [25.870, 39.195,
> > > 36.908, 47.052, 47.384, 40.741, 42.494, 30.282, 25.834, 17.650, 16.004,
> > > 17.895, 13.321, 19.045, 27.440, 31.911, 39.208, 43.622, 41.567, 44.815,
> > > 44.921, 35.422, 35.477, 23.190, 17.859, 14.684, 23.504, 23.141, 21.746,
> > > 30.816, 31.361, 37.015, 38.094, 46.688, 47.681, 43.777, 39.652, 31.701,
> > > 23.767, 22.265, 22.654, 22.327, 19.195, 21.163, 29.602, 27.563, 36.244,
> > > 38.859, 44.234, 42.352, 42.160, 40.172, 30.094, 21.092, 25.193, 13.096,
> > > 18.111, 14.754, 27.386, 27.026]
> > >         1.0 : [distance-squared=1544.4011543572997]: [28.075, 41.784,
> > > 42.120, 38.735, 44.320, 34.316, 32.212, 31.868, 24.301, 14.547, 17.178,
> > > 22.279, 24.357, 31.011, 31.444, 34.837, 46.550, 48.301, 38.859, 42.363,
> > > 35.657, 31.499, 19.794, 12.124, 15.371, 12.436, 15.763, 24.679, 32.597,
> > > 43.004, 36.616, 38.935, 42.954, 34.957, 36.183, 28.177, 16.326, 14.988,
> > > 10.680, 22.728, 24.075, 24.058, 36.616, 43.982, 39.198, 40.118, 40.078,
> > > 34.752, 34.018, 23.750, 18.374, 12.251, 15.539, 18.699, 28.973, 24.044,
> > > 39.404, 38.034, 46.458, 44.432]
> > >         1.0 : [distance-squared=825.9338725427806]: [33.670, 38.675,
> > > 39.742, 41.989, 37.291, 43.975, 31.909, 25.878, 31.080, 15.858, 13.950,
> > > 23.097, 19.983, 21.692, 31.579, 38.570, 33.376, 38.843, 41.936, 33.534,
> > > 39.195, 32.897, 25.343, 18.523, 15.089, 17.771, 22.614, 25.313, 23.687,
> > > 29.010, 41.995, 35.712, 40.872, 41.669, 32.156, 25.162, 24.980, 23.705,
> > > 18.413, 20.975, 14.906, 26.171, 30.165, 27.818, 35.083, 39.514, 37.851,
> > > 33.967, 32.338, 34.977, 26.589, 28.079, 19.597, 24.669, 23.098, 25.685,
> > > 28.215, 34.940, 36.910, 39.749]
> > > Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Wrote 6 clusters
> > > Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Program took 16902 ms (Minutes: 0.2817)
> > >
> > > *Fuzzy k-means:*
> > > [snip]
> > >         1.0 : [distance-squared=971.7369782121968]: [29.625, 25.503,
> > > 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> > > 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> > > 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> > > 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> > > 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> > > 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> > > 15.285, 22.528, 20.657, 24.129]
> > >         1.0 : [distance-squared=2054.618163154475]: [27.414, 25.397,
> > > 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> > > 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> > > 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> > > 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> > > 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> > > 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> > > 20.229, 11.131, 9.980, 10.720]
> > >         1.0 : [distance-squared=954.6503560728597]: [35.899, 26.672,
> > > 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> > > 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> > > 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> > > 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> > > 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> > > 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> > > 16.546, 15.927, 18.084, 17.475]
> > >         1.0 : [distance-squared=2817.9170498632957]: [24.538, 24.280,
> > > 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> > > 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> > > 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> > > 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> > > 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> > > 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> > > 19.310, 12.999, 17.460]
> > >         1.0 : [distance-squared=3472.3684696871424]: [34.335, 30.938,
> > > 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> > > 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> > > 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> > > 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> > > 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> > > 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> > > 11.743, 11.699, 10.152]
> > > Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Wrote 6 clusters
> > > Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Program took 106615 ms (Minutes: 1.7769166666666667)
> > >
> > > *Dirichlet and Meanshift:*
> > > Already detailed in M-1400, deprecated jobs still referenced.
> > >
> > >
> > >
> > > On Tue, Jan 21, 2014 at 6:20 PM, Andrew Musselman <
> > > andrew.musselman@gmail.com> wrote:
> > >
> > >> *cluster-reuters.sh*
> > >> *k-means:*
> > >>
> > >> [snip]
> > >> :VL-19482{n=913 c=[0.06:0.011, 0.1:0.007, 0.13:0.010, 0.25:0.016,
> > >> 0.38:0.020, 0.4:0.007, 0.5:0.032, 0
> > >>         Top Terms:
> > >>                 banks                                   =>
> > >> 3.841823268955143
> > >>                 bank                                    =>
> > >>  3.80633066361209
> > >>                 debt                                    =>
> > >>  3.28065219870794
> > >>                 said                                    =>
> > >>  2.5965700942088583
> > >>                 he                                      =>
> > >> 2.335682813857497
> > >>                 foreign                                 =>
> > >>  2.2217853688201403
> > >>                 billion                                 =>
> > >>  2.1970193848291335
> > >>                 would                                   =>
> > >>  1.9932392063955617
> > >>                 loans                                   =>
> > >>  1.9309276792854233
> > >>                 interest                                =>
> > >>  1.787324501938
> > >>                 have                                    =>
> > >> 1.762981951432578
> > >>                 its                                     =>
> > >>  1.7615109954971866
> > >>                 which                                   =>
> > >>  1.5822081148036862
> > >>                 has                                     =>
> > >>  1.5600708189041956
> > >>                 dlrs                                    =>
> > >>  1.5571038313005996
> > >>                 finance                                 =>
> > >>  1.5539758811252924
> > >>                 new                                     =>
> > >>  1.5176015811577555
> > >>                 had                                     =>
> > >>  1.5138723701401844
> > >>                 brazil                                  =>
> > >>  1.5083369853593172
> > >>                 payments                                =>
> > >>  1.4539044255886517
> > >>         Weight : [props - optional]:  Point:
> > >>
> > >> :VL-7320{n=2726 c=[0:0.003, 0.1:0.010, 0.2:0.007, 0.3:0.009, 0.4:0.007,
> > >> 0.40:0.003, 0.5:0.009, 0.57:0
> > >>         Top Terms:
> > >>                 vs                                      =>
> > >> 6.126130791333171
> > >>                 net                                     =>
> > >> 4.012191567277523
> > >>                 cts                                     =>
> > >> 3.822006848832744
> > >>                 shr                                     =>
> > >>  3.6786004856764527
> > >>                 mln                                     =>
> > >>  2.9011643584038698
> > >>                 loss                                    =>
> > >> 2.788368861463607
> > >>                 qtr                                     =>
> > >> 2.714140225051522
> > >>                 revs                                    =>
> > >>  2.4739861236454717
> > >>                 profit                                  =>
> > >>  1.8146888090247015
> > >>                 note                                    =>
> > >>  1.7977163272138388
> > >>                 dlrs                                    =>
> > >>  1.6164390808155846
> > >>                 avg                                     =>
> > >>  1.3901765773336587
> > >>                 shrs                                    =>
> > >>  1.3856326531419314
> > >>                 mths                                    =>
> > >>  1.3168717272038506
> > >>                 4th                                     =>
> > >>  1.2161158425617289
> > >>                 oper                                    =>
> > >> 1.182419473776814
> > >>                 year                                    =>
> > >> 1.178086061733047
> > >>                 nine                                    =>
> > >>  1.0670554836445316
> > >>                 3rd                                     =>
> > >> 1.041334410056592
> > >>                 inc                                     =>
> > >>  1.0019361981554935
> > >>         Weight : [props - optional]:  Point:
> > >>
> > >>
> > >> Inter-Cluster Density: 0.45562152681859414
> > >> Intra-Cluster Density: 0.6952712632167628
> > >> CDbw Inter-Cluster Density: 0.0
> > >> CDbw Intra-Cluster Density: 16.486930227598684
> > >> CDbw Separation: 194.49005884464628
> > >>
> > >> *fuzzy k-means:*
> > >> :SV-18539{n=1039 c=[0:0.026, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> > >> 0.01:0.005, 0.02:0.002, 0.0
> > >>         Top Terms:
> > >>                 said                                    =>
> > >>  1.8665592354713065
> > >>                 its                                     =>
> > >>  1.1335212213411592
> > >>                 pct                                     =>
> > >>  1.0862816801353348
> > >>                 dlrs                                    =>
> > >>  1.0854998884993752
> > >>                 mln                                     =>
> > >> 1.043163996400643
> > >>                 from                                    =>
> > >>  0.9684961110525736
> > >>                 has                                     =>
> > >> 0.912161511978058
> > >>                 company                                 =>
> > >>  0.8754186972808333
> > >>                 mar                                     =>
> > >>  0.8675333452422878
> > >>                 inc                                     =>
> > >>  0.7678617590362815
> > >>                 would                                   =>
> > >>  0.7610968883652675
> > >>                 he                                      =>
> > >>  0.7459988770503974
> > >>                 which                                   =>
> > >>  0.7435613119406804
> > >>                 year                                    =>
> > >>  0.7302840632748394
> > >>                 u.s                                     =>
> > >>  0.7281061062439116
> > >>                 shares                                  =>
> > >>  0.7260764102983083
> > >>                 corp                                    =>
> > >>  0.7179807367808658
> > >>                 new                                     =>
> > >>  0.7044203783157115
> > >>                 stock                                   =>
> > >>  0.6962010978721442
> > >>                 have                                    =>
> > >>  0.6464265467298506
> > >> :SV-9431{n=1034 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> > >> 0.01:0.004, 0.02:0.002, 0.02
> > >>         Top Terms:
> > >>                 said                                    =>
> > >> 1.864911184196927
> > >>                 dlrs                                    =>
> > >> 1.199286689822081
> > >>                 mln                                     =>
> > >>  1.1802134783562215
> > >>                 pct                                     =>
> > >>  1.1529704214798124
> > >>                 its                                     =>
> > >>  1.1184398851519701
> > >>                 from                                    =>
> > >> 1.016647848050332
> > >>                 company                                 =>
> > >> 0.894703604722841
> > >>                 mar                                     =>
> > >> 0.879986159541356
> > >>                 has                                     =>
> > >>  0.8642799128491316
> > >>                 year                                    =>
> > >>  0.8271823503717782
> > >>                 inc                                     =>
> > >>  0.7871293745341424
> > >>                 corp                                    =>
> > >> 0.737705498468879
> > >>                 which                                   =>
> > >> 0.722975201852743
> > >>                 would                                   =>
> > >> 0.708000816484415
> > >>                 u.s                                     =>
> > >>  0.7073294276173905
> > >>                 billion                                 =>
> > >>  0.7055723996916351
> > >>                 he                                      =>
> > >>  0.7042684217823294
> > >>                 new                                     =>
> > >>  0.6834737905434939
> > >>                 shares                                  =>
> > >>  0.6753327384172428
> > >>                 stock                                   =>
> > >>  0.6576225144041699
> > >> :SV-4785{n=1044 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> > >> 0.01:0.006, 0.02:0.002, 0.02
> > >>         Top Terms:
> > >>                 said                                    =>
> > >>  1.8796076179735086
> > >>                 its                                     =>
> > >> 1.172025965452378
> > >>                 dlrs                                    =>
> > >> 1.130422792460914
> > >>                 pct                                     =>
> > >> 1.082038255241358
> > >>                 mln                                     =>
> > >>  1.0772146872767114
> > >>                 company                                 =>
> > >>  0.9662235879639138
> > >>                 from                                    =>
> > >>  0.9473172871605616
> > >>                 has                                     =>
> > >>  0.9224712965830099
> > >>                 mar                                     =>
> > >>  0.8769325856924421
> > >>                 inc                                     =>
> > >>  0.8360245257169788
> > >>                 shares                                  =>
> > >>  0.8334595641384324
> > >>                 stock                                   =>
> > >>  0.7704621839612175
> > >>                 corp                                    =>
> > >>  0.7682400250301806
> > >>                 which                                   =>
> > >>  0.7389988207856137
> > >>                 would                                   =>
> > >>  0.7339708917389389
> > >>                 year                                    =>
> > >>  0.7088414843731325
> > >>                 new                                     =>
> > >>  0.7038109468655172
> > >>                 he                                      =>
> > >>  0.6993994455501005
> > >>                 u.s                                     =>
> > >>  0.6772649147622415
> > >>                 share                                   =>
> > >>  0.6241804830055171
> > >>
> > >> *lda:*
> > >>
> > >> [snip]
> > >> 21539
> > >> {0.02:0.7071698119320297,0.01:0.09185661419250732,0.055:0.05559716236030052,0:0.04416624482186779,0.046:0.04110903741109505,0.10:0.036664417789754995,0.073:0.009543544335363714,0.006913:0.0050293341354450535,0.007050:0.004031353935388081,0.003:0.0019163134919350053}
> > >> 21540
> > >> {0.04:0.4289840457590362,0.006913:0.3764552842292379,0.03:0.14201700033942147,0.025:0.031492533562460345,0.01:0.0057428959027208,0.0625:0.005563615996288134,0.007050:0.004410121345864598,0.02:0.003315679992182833,0.06:0.0010680133665222197,0.057:2.6908116668663575E-4}
> > >> 21541
> > >> {0:0.6323790020346274,0.02:0.282038438551802,0.06:0.05929917592183956,0.046:0.021922159161082488,0.025:0.003123559949176154,0.1:6.163861804777446E-4,0.057:1.1513607281652563E-4,0.077:7.546641269535658E-5,0.05:6.020069105221075E-5,0.04:5.727546417602751E-5}
> > >> 21542
> > >> {0.06:0.7916202902993545,0.003:0.12474538653424426,0.03:0.0516187501990535,0.073:0.013078206873952192,0.077:0.006874558859394474,0.1:0.004726257215175674,0.07:0.0029954110394770084,0.02:0.0015535221634909381,0.0625:8.057802584933225E-4,0.006913:5.349608183182145E-4}
> > >> 21543
> > >> {0.03:0.8557703918728101,0.007050:0.0884696514495358,0.01:0.02386217742025169,0.055:0.01794750983209314,0.046:0.007244240662980594,0.02:0.0025021610305199344,0.1:0.0011795344808501283,0.0625:9.555718731181631E-4,0.077:8.384080940465597E-4,0.003:2.4480111447900804E-4}
> > >> 21544
> > >> {0.006913:0.6497351267772595,0.03:0.2586364130498913,0.003:0.0845308091084703,0.05:0.003488921831506818,0.0625:0.0011991059579690732,0.06:5.301659535652308E-4,0.055:3.152013670552699E-4,0.025:3.11949418681945E-4,0.02:2.2346509541410544E-4,0.057:1.727741085612565E-4}
> > >> 21545
> > >> {0.07:0.24937001680778675,0.03:0.22854680579732564,0.1:0.22068070811382484,0.05:0.21891358916881656,0.06:0.04971205462818302,0.046:0.029384231252419384,0.025:0.0014118858333346275,0.0625:0.0012321257282372393,0.073:3.1606402052550454E-4,0.04:1.1662985389192366E-4}
> > >> 21546
> > >> {0.006913:0.652435612545713,0.073:0.24282539253890825,0.003:0.055020663101050894,0.046:0.01978650831133704,0.04:0.011699600233998459,0.02:0.006822518565048025,0.05:0.0042733514779345234,0.025:0.0020768247329981497,0.03:0.0015466188524926666,0:0.0010433788491090747}
> > >> 21547
> > >> {0.1:0.935072008706917,0.077:0.040616382672055494,0.046:0.023138054104883225,0.06:3.42252302356326E-4,0.03:2.672247473681925E-4,0.007050:9.004419280457053E-5,0.006913:7.844271097106069E-5,0.04:7.198835322717653E-5,0.073:5.8017696474244885E-5,0.02:4.474316852571109E-5}
> > >> 21548
> > >> {0.006913:0.4510141547325999,0.03:0.21472666934984877,0.07:0.15060570743672352,0.046:0.08216181853028293,0.05:0.07498297963542139,0.077:0.01401215532342401,0.04:0.006135722806477439,0.073:0.0031124469556872442,0.02:0.001327252389650958,0.025:4.5167404646311124E-4}
> > >> 21549
> > >> {0.077:0.5249260290096315,0:0.27520186965742544,0.073:0.08959570186504386,0.057:0.05063809804337512,0.02:0.045313417578343,0.03:0.00856024297303885,0.06:0.0034039136814002697,0.07:8.611323331226122E-4,0.05:3.9387255234958607E-4,0.04:3.454752498776842E-4}
> > >> 21550
> > >> {0.077:0.8319708524327014,0.007050:0.16001430652201168,0.0625:0.0024884528530674226,0.03:0.0024228964094551416,0.006913:0.0010048446080994332,0.025:5.974607168723063E-4,0.06:2.7741593377792194E-4,0.057:2.2975316663437597E-4,0.073:2.1769059479546598E-4,0.01:1.4945234676270913E-4}
> > >> 21551
> > >> {0:0.43298549949426596,0.07:0.30407459462158964,0.0625:0.15744077685124136,0.01:0.06385977925647884,0.025:0.020205557109867888,0.04:0.013226123091680062,0.055:0.006501562964287696,0.073:5.871897881404074E-4,0.003:4.554836644848264E-4,0.05:2.0489216962221778E-4}
> > >> 21552
> > >> {0.06:0.7383744333942458,0.02:0.07321126682102753,0.07:0.05910378841288311,0.10:0.056907223730939045,0:0.02739559786902668,0.055:0.02228913751272657,0.1:0.00943274247398869,0.073:0.007301445750018608,0.03:0.0027711985062277246,0.046:0.0022569760697531112}
> > >> 21553
> > >> {0.006913:0.9383779606743132,0.07:0.018356074893823605,0.1:0.017572502072403694,0.025:0.011031848455352145,0.02:0.007110054905474798,0.10:0.0032390933761145377,0.01:0.0022676909091530165,0.06:6.35712654913703E-4,0.003:3.037608224814005E-4,0.0625:2.569989005512836E-4}
> > >> 21554
> > >> {0.02:0.4740260172915081,0.03:0.3264871353578654,0.003:0.12383071192450323,0:0.0387913487693229,0.046:0.013922363892934853,0.01:0.01115832101415319,0.055:0.004438529958216764,0.006913:0.0031112721201723413,0.025:0.002399946628810436,0.073:6.131643125523258E-4}
> > >> 21555
> > >> {0.01:0.42396888624053325,0.057:0.2876207419392007,0.055:0.13436747721404915,0.10:0.0923627978966975,0.05:0.030530372962547347,0.003:0.008404234144369286,0.006913:0.007605013189108045,0.007050:0.005108825532219537,0.0625:0.004651573366090424,0.025:0.002180181204608079}
> > >> 21556
> > >> {0.02:0.9634685201058687,0.077:0.02816791882994464,0.003:0.007368001305747207,0.01:4.6299664334060455E-4,0.055:6.358811991095171E-5,0.06:5.5240076581247115E-5,0.03:5.231350505975146E-5,0.073:4.4773553407989E-5,0:3.6810457882911134E-5,0.006913:3.469265861451538E-5}
> > >> 21557
> > >> {0.06:0.9993947051352264,0.05:2.53296407549323E-4,0.03:9.647254237540585E-5,0.04:4.3439270433017595E-5,0.006913:3.354681491698619E-5,0.046:2.8586727566416525E-5,0.007050:2.5584606074089293E-5,0.02:2.2477243875316502E-5,0:1.9194783598477564E-5,0.073:1.6900267481892075E-5}
> > >> 21558
> > >> {0.06:0.9985073788555696,0.03:8.033091100049725E-4,0.02:1.1953766739610202E-4,0.1:9.257610652171745E-5,0.006913:6.903783269939522E-5,0.04:5.7960967280609926E-5,0.07:5.5767050956214925E-5,0.046:4.857027337508731E-5,0.007050:3.5964741541364354E-5,0.057:3.538185773175377E-5}
> > >> 21559
> > >> {0.006913:0.5411158302162348,0.073:0.10595054605908563,0.04:0.10013413310674449,0.007050:0.08891690362990352,0.003:0.08028744789933502,0.03:0.07779866329563544,0.025:0.0024314950846438975,0.0625:0.0018111845757907532,0.077:5.019763638644379E-4,0.055:2.6705221435486376E-4}
> > >> 21560
> > >> {0.06:0.9978884964462115,0.03:9.139179655096763E-4,0.02:6.066525825847198E-4,0.04:3.2209695809936266E-4,0.006913:7.970816960964983E-5,0.007050:5.58984012266038E-5,0.046:1.545206468939752E-5,0.077:1.5319891034946036E-5,0:1.5223193391279898E-5,0.073:1.1687759018690983E-5}
> > >> 21561
> > >> {0.06:0.8704787441462888,0.007050:0.09131090501970876,0.0625:0.03778566273654969,0.046:8.111987924033587E-5,0.01:6.397000919080148E-5,0.1:4.979138875277178E-5,0.073:2.7942853854174412E-5,0.03:2.762357950066146E-5,0.077:2.6820742114533983E-5,0.025:2.6208487691114472E-5}
> > >> 21562
> > >> {0.06:0.9625603310705717,0.02:0.01708933716171641,0.007050:0.011374975845817934,0.07:0.00482114999912725,0.057:0.003310531318631415,0.077:2.568457462720504E-4,0.025:1.3439876933758153E-4,0.1:1.0512433283405881E-4,0:8.737443941062146E-5,0.046:5.9284232059258864E-5}
> > >> 21563
> > >> {0.06:0.9996809273982157,0.046:1.6325482120709976E-4,0.02:2.1084598024339765E-5,0.006913:1.7901242019979392E-5,0.04:1.3991162886383784E-5,0.03:1.3682157013015017E-5,0.077:1.3602492456590212E-5,0.007050:1.206712606141859E-5,0.1:1.0231842284246997E-5,0.057:7.704725064510759E-6}
> > >> 21564
> > >> {0.06:0.9997861963976675,0.03:2.7140899197995276E-5,0.077:2.4456158311423386E-5,0.04:1.7952818233846462E-5,0.02:1.6334455815684533E-5,0.006913:1.4910843270220926E-5,0.073:1.4893553937733922E-5,0.1:1.2545291899719683E-5,0.007050:1.2337617858874285E-5,0.07:1.121302251254011E-5}
> > >> 21565
> > >> {0.006913:0.5047832315246878,0.007050:0.2502505818382197,0.04:0.09937533960784072,0.03:0.05332716291468396,0.0625:0.035738976624857435,0.05:0.023139962103851885,0.1:0.01510786357969295,0.025:0.01294816540331917,0.06:0.003589347327961106,0.073:4.8607269023994543E-4}
> > >> 21566
> > >> {0.03:0.865579490292393,0.073:0.065013560785593,0.077:0.056622094108767465,0.046:0.006811842330071251,0.057:0.0021561477140846267,0.01:0.0013149375957061502,0.04:7.456782721333958E-4,0.05:6.412995789267404E-4,0:2.107021879325011E-4,0.007050:1.8121393989190674E-4}
> > >> 21567
> > >> {0.077:0.6528663315309344,0.03:0.2794089480653573,0.025:0.060214953606503134,0.003:0.0029851203917978303,0.01:0.002840784719750811,0.007050:5.508511345707982E-4,0.02:3.3395642786457786E-4,0.07:1.7452781529689483E-4,0.055:1.541344869853217E-4,0.046:1.0154945247629696E-4}
> > >> 21568
> > >> {0.057:0.7925855379581803,0.03:0.2036907633660934,0.06:0.002283924010657722,0.046:0.0011125177332923534,0.10:6.706442563331911E-5,0.02:3.9070227131596934E-5,0.07:3.266094677087569E-5,0.1:2.3436639268605713E-5,0.077:2.318067689954084E-5,0.006913:2.273173553155518E-5}
> > >> 21569
> > >> {0.06:0.7223525910216753,0.1:0.22280373045161775,0.04:0.04239924319412595,0.02:0.006529556700876843,0.007050:0.004661124794787862,0.10:2.017974034648702E-4,0.05:2.017516118028694E-4,0.025:1.583677755896652E-4,0.006913:1.1939663934259253E-4,0:8.736457986006156E-5}
> > >> 21570
> > >> {0.073:0.5321098304788365,0.006913:0.3897385574581158,0.02:0.051125703897749404,0.077:0.012417023254098358,0.01:0.006491518762040415,0.03:0.005434610908750246,0.055:9.352347087701305E-4,0.06:5.239363525283659E-4,0.10:4.4114444135088393E-4,0.003:2.2360302221231853E-4}
> > >> 21571
> > >> {0.06:0.9074255414695478,0.05:0.08971808812931319,0.02:0.0019124104766371694,0.1:2.581234320438502E-4,0.073:1.9041498001195312E-4,0.046:8.348355382104383E-5,0.006913:5.748062878632201E-5,0.057:4.8979196235823963E-5,0.04:4.4005272512340306E-5,0.03:4.1089625609562384E-5}
> > >> 21572
> > >> {0.06:0.7216888928389846,0.04:0.1955684645266304,0.006913:0.08235454447065854,0.1:9.82899034505965E-5,0.046:4.9184577303445956E-5,0.05:4.021965070167039E-5,0.007050:3.081280652634891E-5,0.073:2.791145099471127E-5,0.02:1.95681856192452E-5,0:1.9037190007395713E-5}
> > >> 21573
> > >> {0.05:0.8568971411565196,0.046:0.12909436895238377,0.06:0.012704234652048044,0.04:2.472784666357729E-4,0.055:2.1886698996361582E-4,0.1:1.1974451444757112E-4,0.0625:9.082357988309755E-5,0.07:9.030017229129562E-5,0.03:7.268045692763623E-5,0.073:6.607926928741721E-5}
> > >> 21574
> > >> {0.046:0.5619466458628039,0.006913:0.184782367089353,0.0625:0.09726566772972363,0.003:0.09534816862353344,0.02:0.015692927163565275,0.073:0.015492017672231727,0.01:0.01411218625979968,0.007050:0.01037341031640615,0.055:0.0020124341216292752,0:0.0013828685922332715}
> > >> 21575
> > >> {0.05:0.3167681189235041,0.06:0.2879333280436204,0.046:0.22584628506521745,0.003:0.15994241628395953,0.0625:0.006482994028630967,0.1:0.0016773104050919493,0.055:4.8725015996676173E-4,0.03:2.432893903382962E-4,0.025:1.6015914759364425E-4,0.006913:1.0021292427951807E-4}
> > >> 21576
> > >> {0.077:0.2407816981967022,0.003:0.18594668103110193,0.10:0.14951775492012523,0.0625:0.09241152906714677,0.007050:0.08997645163280943,0.057:0.07102865286733068,0.055:0.048046579920457584,0.05:0.03776387140040494,0.073:0.033355081179026046,0.006913:0.015980584385115525}
> > >> 21577
> > >> {0.06:0.9599074612361259,0.1:0.03694135499501186,0.04:0.0021941315048273186,0.057:4.7070027358666304E-4,0.02:1.8231606308803002E-4,0.0625:6.234942557920162E-5,0.006913:5.159011604129561E-5,0.046:3.9832812943124216E-5,0.007050:3.0380683715134534E-5,0.05:2.8920851352755496E-5}
> > >>
> > >>
> > >> *Streaming k-means:*
> > >>
> > >> [snip]
> > >> INFO: Number of Centroids: 0
> > >> Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
> > >> WARNING: job_local23982482_0001
> > >> java.lang.IllegalArgumentException: Must have nonzero number of training
> > >> and test vectors. Asked for %.1f %% of %d vectors for test
> > >> [10.000000149011612, 0]
> > >>         at
> > >> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
> > >>         at
> > >> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
> > >>         at
> > >> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
> > >>         at
> > >> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
> > >>         at
> > >> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
> > >>         at
> > >> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
> > >>         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
> > >>         at
> > >> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
> > >>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
> > >>         at
> > >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
> > >>
> > >> [snip]
> > >>
> > >> WARNING: No qualcluster.props found on classpath, will use command-line
> > >> arguments only
> > >> Num clusters: 0; maxDistance: 0.000000
> > >> [Dunn Index] First: Infinity
> > >> [Davies-Bouldin Index] First: NaN
> > >> Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
> > >> INFO: Program took 535 ms (Minutes: 0.008916666666666666)
> > >> cluster,distance.mean,distance.sd
> > >> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > >>
> > >>
> > >> On Tue, Jan 21, 2014 at 1:47 PM, Andrew Musselman <
> > >> andrew.musselman@gmail.com> wrote:
> > >>
> > >>> *classify-20newsgroups.sh*
> > >>>
> > >>> *Complementary naive bayes:*
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances          :      11207       98.9406%
> > >>> Incorrectly Classified Instances        :        120        1.0594%
> > >>> Total Classified Instances              :      11327
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a       b       c       d       e       f       g       h       i
> > >>> j       k       l       m       n       o       p       q       r       s
> > >>>    t        <--Classified as
> > >>> 475     0       0       1       0       0       0       0       0
> > >>> 0       0       0       0       0       1       0       1       0       0
> > >>>    0         |  478         a     = alt.atheism
> > >>> 0       597     1       1       0       1       1       0       0
> > >>> 0       0       1       0       2       1       0       0       0       0
> > >>>    0         |  605         b     = comp.graphics
> > >>> 0       1       620     3       0       1       0       0       0
> > >>> 0       0       1       0       0       1       0       0       0       0
> > >>>    0         |  627         c     = comp.os.ms-windows.misc
> > >>> 1       1       1       593     2       0       0       0       0
> > >>> 0       0       0       0       0       0       1       0       0       0
> > >>>    0         |  599         d     = comp.sys.ibm.pc.hardware
> > >>> 0       1       1       0       568     0       1       0       0
> > >>> 0       1       1       2       0       0       0       0       1       0
> > >>>    0         |  576         e     = comp.sys.mac.hardware
> > >>> 0       4       2       0       0       581     0       0       0
> > >>> 0       0       0       0       0       0       0       0       0       0
> > >>>    0         |  587         f     = comp.windows.x
> > >>> 0       0       0       1       2       0       571     3       0
> > >>> 0       1       1       4       1       0       0       0       0       0
> > >>>    0         |  584         g     = misc.forsale
> > >>> 0       0       0       1       0       0       0       589     1
> > >>> 0       0       1       1       0       0       0       0       0       0
> > >>>    0         |  593         h     = rec.autos
> > >>> 0       0       0       0       0       0       0       1       565
> > >>> 0       0       0       0       0       1       0       0       0       0
> > >>>    0         |  567         i     = rec.motorcycles
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 600     2       0       0       0       1       0       0       0       0
> > >>>    0         |  603         j     = rec.sport.baseball
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 1       584     0       0       0       0       0       0       0       0
> > >>>    0         |  585         k     = rec.sport.hockey
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       579     0       0       0       0       0       1       0
> > >>>    0         |  580         l     = sci.crypt
> > >>> 0       0       0       1       3       0       2       0       0
> > >>> 2       0       0       567     1       2       1       0       0       0
> > >>>    0         |  579         m     = sci.electronics
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       0       1       605     0       0       0       0       0
> > >>>    0         |  606         n     = sci.med
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       0       0       0       602     0       0       0       0
> > >>>    0         |  602         o     = sci.space
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       0       0       1       0       602     0       0       1
> > >>>    0         |  604         p     = soc.religion.christian
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       0       0       0       0       0       556     0       0
> > >>>    0         |  556         q     = talk.politics.mideast
> > >>> 0       0       1       0       0       0       0       0       0
> > >>> 0       0       1       0       0       1       0       0       568     0
> > >>>    0         |  571         r     = talk.politics.guns
> > >>> 11      0       0       0       0       0       0       0       0
> > >>> 1       0       0       0       1       3       8       1       4       338
> > >>>    2         |  369         s     = talk.religion.misc
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       1       0       0       0       1       0       3       4       0
> > >>>    447       |  456         t     = talk.politics.misc
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa                                       0.9806
> > >>> Accuracy                                   98.9406%
> > >>> Reliability                                94.0932%
> > >>> Reliability (standard deviation)            0.2163
> > >>>
> > >>> Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 15870 ms (Minutes: 0.2645)
> > >>> + echo 'Testing on holdout set'
> > >>> Testing on holdout set
> > >>> + ./bin/mahout testnb -i /tmp/mahout-work-ec2-user/20news-test-vectors
> > >>> -m /tmp/mahout-work-ec2-user/model -l /tmp/mahout-work-ec2-user/labelindex
> > >>> -ow -o /tmp/mahout-work-ec2-user/20news-testing -c
> > >>>
> > >>> [snip]
> > >>>
> > >>> INFO: Complementary Results:
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances          :       6715       89.3071%
> > >>> Incorrectly Classified Instances        :        804       10.6929%
> > >>> Total Classified Instances              :       7519
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a       b       c       d       e       f       g       h       i
> > >>> j       k       l       m       n       o       p       q       r       s
> > >>>    t        <--Classified as
> > >>> 298     0       0       0       0       0       0       0       0
> > >>> 1       0       0       0       1       2       5       1       0       13
> > >>>     0         |  321         a     = alt.atheism
> > >>> 0       298     11      6       1       12      2       2       1
> > >>> 1       3       8       3       4       2       4       1       4       4
> > >>>    1         |  368         b     = comp.graphics
> > >>> 1       17      286     16      4       9       6       3       2
> > >>> 0       1       0       1       7       1       0       2       1       0
> > >>>    1         |  358         c     = comp.os.ms-windows.misc
> > >>> 2       6       11      309     9       5       14      8       1
> > >>> 0       2       0       6       4       2       0       1       2       1
> > >>>    0         |  383         d     = comp.sys.ibm.pc.hardware
> > >>> 0       10      8       7       334     7       5       5       2
> > >>> 0       3       0       2       1       1       0       1       1       0
> > >>>    0         |  387         e     = comp.sys.mac.hardware
> > >>> 1       13      7       8       2       355     2       0       2
> > >>> 0       0       5       1       1       3       0       0       1       0
> > >>>    0         |  401         f     = comp.windows.x
> > >>> 0       7       11      29      12      9       268     16      8
> > >>> 4       3       2       6       4       2       1       3       1       2
> > >>>    3         |  391         g     = misc.forsale
> > >>> 0       1       0       0       3       0       7       362     8
> > >>> 2       2       1       2       0       2       0       1       2       0
> > >>>    4         |  397         h     = rec.autos
> > >>> 0       0       0       1       0       0       1       0       423
> > >>> 0       0       0       2       1       0       1       0       0       0
> > >>>    0         |  429         i     = rec.motorcycles
> > >>> 0       0       1       0       0       0       0       2       2
> > >>> 371     8       0       2       3       0       2       0       0       0
> > >>>    0         |  391         j     = rec.sport.baseball
> > >>> 0       0       1       0       0       0       1       0       0
> > >>> 2       409     0       0       0       0       0       0       0       0
> > >>>    1         |  414         k     = rec.sport.hockey
> > >>> 0       0       1       2       1       0       1       0       0
> > >>> 0       0       404     0       0       0       0       0       1       0
> > >>>    1         |  411         l     = sci.crypt
> > >>> 0       5       4       11      1       3       7       9       2
> > >>> 5       3       3       339     2       6       0       1       1       2
> > >>>    1         |  405         m     = sci.electronics
> > >>> 0       4       0       1       0       0       0       1       0
> > >>> 1       1       0       3       367     3       1       2       0       0
> > >>>    0         |  384         n     = sci.med
> > >>> 0       1       2       0       1       0       2       0       0
> > >>> 1       0       0       1       1       375     0       1       0       0
> > >>>    0         |  385         o     = sci.space
> > >>> 4       2       1       1       0       0       1       1       2
> > >>> 0       0       1       1       5       1       367     4       0       1
> > >>>    1         |  393         p     = soc.religion.christian
> > >>> 0       1       0       0       0       0       0       0       0
> > >>> 2       0       0       0       0       0       2       378     0       1
> > >>>    0         |  384         q     = talk.politics.mideast
> > >>> 0       0       0       0       0       2       1       1       1
> > >>> 1       0       3       0       3       0       0       2       319     2
> > >>>    4         |  339         r     = talk.politics.guns
> > >>> 32      0       0       1       0       0       0       0       0
> > >>> 1       1       1       0       2       2       26      5       7       175
> > >>>    6         |  259         s     = talk.religion.misc
> > >>> 0       0       0       2       0       0       0       0       0
> > >>> 1       2       2       0       1       2       1       10      18      2
> > >>>    278       |  319         t     = talk.politics.misc
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa                                       0.8594
> > >>> Accuracy                                   89.3071%
> > >>> Reliability                                 84.611%
> > >>> Reliability (standard deviation)            0.2148
> > >>>
> > >>> Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 10840 ms (Minutes: 0.18066666666666667)
> > >>>
> > >>>
> > >>> *Naive bayes:*
> > >>> INFO: Standard NB Results:
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances          :      11286       99.0869%
> > >>> Incorrectly Classified Instances        :        104        0.9131%
> > >>> Total Classified Instances              :      11390
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a       b       c       d       e       f       g       h       i
> > >>> j       k       l       m       n       o       p       q       r       s
> > >>>    t        <--Classified as
> > >>> 474     0       0       0       0       0       0       0       0
> > >>> 0       0       0       0       0       0       0       0       0       2
> > >>>    1         |  477         a     = alt.atheism
> > >>> 0       566     0       2       0       1       0       0       0
> > >>> 0       0       0       0       0       0       0       0       0       0
> > >>>    0         |  569         b     = comp.graphics
> > >>> 0       10      590     29      2       4       1       0       0
> > >>> 0       0       0       1       0       0       0       0       0       0
> > >>>    1         |  638         c     = comp.os.ms-windows.misc
> > >>> 0       0       0       596     0       0       0       0       0
> > >>> 0       0       0       0       0       0       0       0       0       0
> > >>>    0         |  596         d     = comp.sys.ibm.pc.hardware
> > >>> 0       0       0       0       575     0       1       0       0
> > >>> 0       0       0       1       0       0       0       0       0       0
> > >>>    0         |  577         e     = comp.sys.mac.hardware
> > >>> 0       2       2       2       0       593     1       0       0
> > >>> 0       0       0       0       0       1       0       0       0       0
> > >>>    0         |  601         f     = comp.windows.x
> > >>> 0       0       0       1       0       0       589     1       0
> > >>> 0       1       0       2       0       0       0       0       0       0
> > >>>    0         |  594         g     = misc.forsale
> > >>> 0       0       0       0       0       0       0       594     0
> > >>> 0       0       0       0       0       0       0       0       0       0
> > >>>    0         |  594         h     = rec.autos
> > >>> 0       0       0       0       0       0       0       0       611
> > >>> 0       0       0       0       0       0       0       0       0       0
> > >>>    0         |  611         i     = rec.motorcycles
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 616     1       0       0       0       0       0       0       0       0
> > >>>    0         |  617         j     = rec.sport.baseball
> > >>> 0       0       0       0       0       0       1       0       0
> > >>> 0       620     0       0       0       0       0       0       0       0
> > >>>    0         |  621         k     = rec.sport.hockey
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       580     0       0       0       0       0       1       0
> > >>>    0         |  581         l     = sci.crypt
> > >>> 0       0       0       3       1       0       0       0       0
> > >>> 0       0       0       571     0       0       0       0       0       0
> > >>>    0         |  575         m     = sci.electronics
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       0       2       583     0       0       0       0       0
> > >>>    0         |  585         n     = sci.med
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       0       0       1       599     0       0       0       0
> > >>>    0         |  600         o     = sci.space
> > >>> 0       1       0       0       0       0       0       0       0
> > >>> 0       0       0       0       0       0       615     0       0       0
> > >>>    0         |  616         p     = soc.religion.christian
> > >>> 1       0       0       0       0       0       0       0       0
> > >>> 0       0       0       0       0       0       1       560     0       0
> > >>>    0         |  562         q     = talk.politics.mideast
> > >>> 0       0       1       0       0       0       0       0       0
> > >>> 0       0       1       0       0       0       0       0       548     0
> > >>>    1         |  551         r     = talk.politics.guns
> > >>> 10      0       0       0       0       0       0       0       0
> > >>> 0       0       0       0       0       1       1       0       2       344
> > >>>    1         |  359         s     = talk.religion.misc
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       1       1       0       0       0       0       2       0
> > >>>    462       |  466         t     = talk.politics.misc
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa                                       0.9847
> > >>>  Accuracy                                   99.0869%
> > >>> Reliability                                94.3334%
> > >>> Reliability (standard deviation)            0.2169
> > >>>
> > >>> Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 14304 ms (Minutes: 0.2384)
> > >>> + echo 'Testing on holdout set'
> > >>> Testing on holdout set
> > >>>
> > >>> [snip]
> > >>>
> > >>> INFO: Standard NB Results:
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances          :       6718       90.1019%
> > >>> Incorrectly Classified Instances        :        738        9.8981%
> > >>> Total Classified Instances              :       7456
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a       b       c       d       e       f       g       h       i
> > >>> j       k       l       m       n       o       p       q       r       s
> > >>>    t        <--Classified as
> > >>> 294     0       0       0       0       0       0       0       0
> > >>> 0       0       2       0       1       1       6       1       1       16
> > >>>     0         |  322         a     = alt.atheism
> > >>> 0       345     6       14      6       11      6       0       0
> > >>> 0       0       5       7       1       3       0       0       0       0
> > >>>    0         |  404         b     = comp.graphics
> > >>> 2       29      177     78      22      19      9       1       0
> > >>> 0       0       4       2       0       1       1       0       0       1
> > >>>    1         |  347         c     = comp.os.ms-windows.misc
> > >>> 1       9       2       335     18      2       10      0       0
> > >>> 0       1       0       8       0       0       0       0       0       0
> > >>>    0         |  386         d     = comp.sys.ibm.pc.hardware
> > >>> 1       4       2       13      347     3       5       1       0
> > >>> 0       1       0       7       1       0       0       0       1       0
> > >>>    0         |  386         e     = comp.sys.mac.hardware
> > >>> 0       20      0       4       0       352     4       0       0
> > >>> 0       0       0       1       1       3       0       1       0       1
> > >>>    0         |  387         f     = comp.windows.x
> > >>> 0       2       0       21      5       1       323     7       2
> > >>> 2       0       2       12      0       3       0       0       0       0
> > >>>    1         |  381         g     = misc.forsale
> > >>> 0       1       0       0       1       0       15      363     8
> > >>> 1       0       0       4       1       0       0       0       1       0
> > >>>    1         |  396         h     = rec.autos
> > >>> 0       1       0       0       0       0       6       6       370
> > >>> 0       0       0       0       1       0       0       0       0       1
> > >>>    0         |  385         i     = rec.motorcycles
> > >>> 1       0       0       1       1       0       2       1       2
> > >>> 362     5       0       2       0       0       0       0       0       0
> > >>>    0         |  377         j     = rec.sport.baseball
> > >>> 0       0       0       1       2       0       0       0       0
> > >>> 3       371     0       0       0       0       0       0       0       0
> > >>>    1         |  378         k     = rec.sport.hockey
> > >>> 0       3       1       0       1       0       2       0       0
> > >>> 0       0       396     0       1       0       0       1       1       1
> > >>>    3         |  410         l     = sci.crypt
> > >>> 0       7       0       7       7       2       6       4       0
> > >>> 0       0       1       369     2       2       0       0       0       0
> > >>>    2         |  409         m     = sci.electronics
> > >>> 0       3       0       2       1       0       2       0       0
> > >>> 0       0       1       4       383     4       0       0       1       0
> > >>>    4         |  405         n     = sci.med
> > >>> 0       5       0       0       1       0       3       0       0
> > >>> 0       0       0       1       0       374     1       0       0       1
> > >>>    1         |  387         o     = sci.space
> > >>> 6       2       0       1       1       0       0       1       0
> > >>> 1       0       0       1       5       0       352     2       1       7
> > >>>    1         |  381         p     = soc.religion.christian
> > >>> 1       1       0       0       0       0       0       0       0
> > >>> 0       1       0       0       0       0       0       373     1       0
> > >>>    1         |  378         q     = talk.politics.mideast
> > >>> 0       0       0       0       0       0       1       0       1
> > >>> 0       0       2       0       0       0       0       0       346     2
> > >>>    7         |  359         r     = talk.politics.guns
> > >>> 26      1       0       1       0       0       0       2       0
> > >>> 1       1       0       0       1       1       20      2       6       200
> > >>>    7         |  269         s     = talk.religion.misc
> > >>> 1       0       0       0       0       0       0       2       0
> > >>> 0       1       0       0       2       2       0       1       14      0
> > >>>    286       |  309         t     = talk.politics.misc
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa                                       0.8726
> > >>> Accuracy                                   90.1019%
> > >>> Reliability                                85.4491%
> > >>> Reliability (standard deviation)            0.2222
> > >>>
> > >>> Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 10878 ms (Minutes: 0.1813)
> > >>>
> > >>> *SGD:*
> > >>> 7532 test files
> > >>>
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances          :       5649            75%
> > >>> Incorrectly Classified Instances        :       1883            25%
> > >>> Total Classified Instances              :       7532
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a       b       c       d       e       f       g       h       i
> > >>> j       k       l       m       n       o       p       q       r       s
> > >>>    t        <--Classified as
> > >>> 186     6       3       10      5       0       33      4       13
> > >>>  15      7       1       24      15      3       15      5       5       29
> > >>>     15        |  394         a     = sci.space
> > >>> 5       309     0       3       2       5       0       0       0
> > >>> 1       9       21      2       0       0       18      4       4       1
> > >>>    1         |  385         b     = comp.sys.mac.hardware
> > >>> 4       1       101     3       0       1       63      0       7
> > >>> 0       1       1       5       16      3       0       3       7       1
> > >>>    34        |  251         c     = talk.religion.misc
> > >>> 11      12      1       265     1       10      3       0       0
> > >>> 17      10      11      5       2       0       11      3       6       21
> > >>>     0         |  389         d     = comp.graphics
> > >>> 2       1       1       0       349     2       3       0       3
> > >>> 2       6       1       5       1       0       2       15      2       1
> > >>>    2         |  398         e     = rec.motorcycles
> > >>> 7       20      3       19      2       254     6       0       2
> > >>> 11      2       39      7       2       0       4       2       2       9
> > >>>    3         |  394         f     = comp.os.ms-windows.misc
> > >>> 2       1       13      0       0       0       247     0       1
> > >>> 1       3       0       6       2       4       0       2       3       5
> > >>>    29        |  319         g     = alt.atheism
> > >>> 1       1       0       0       2       0       2       361     0
> > >>> 1       2       0       2       0       0       1       3       22      0
> > >>>    1         |  399         h     = rec.sport.hockey
> > >>> 3       0       3       1       0       0       5       0       161
> > >>> 0       1       2       12      102     0       0       1       2       11
> > >>>     6         |  310         i     = talk.politics.misc
> > >>> 2       8       0       19      0       19      0       0       1
> > >>> 294     10      11      4       2       0       5       0       3       11
> > >>>     6         |  395         j     = comp.windows.x
> > >>> 2       10      0       1       1       0       0       0       0
> > >>> 1       347     13      2       1       0       5       3       2       2
> > >>>    0         |  390         k     = misc.forsale
> > >>> 1       36      0       6       1       25      0       0       1
> > >>> 6       10      257     2       1       0       34      6       0       6
> > >>>    0         |  392         l     = comp.sys.ibm.pc.hardware
> > >>> 2       2       2       2       1       0       12      0       0
> > >>> 6       10      4       312     5       2       13      11      3       3
> > >>>    6         |  396         m     = sci.med
> > >>> 2       0       3       2       1       0       0       1       13
> > >>>  0       5       1       2       314     2       0       2       2       10
> > >>>     4         |  364         n     = talk.politics.guns
> > >>> 1       0       2       1       1       0       34      1       33
> > >>>  1       3       0       1       8       271     1       4       5       6
> > >>>      3         |  376         o     = talk.politics.mideast
> > >>> 3       14      0       8       2       8       3       1       1
> > >>> 7       12      29      6       2       1       245     13      2       32
> > >>>     4         |  393         p     = sci.electronics
> > >>> 3       3       0       2       11      0       1       0       2
> > >>> 1       11      6       4       2       0       11      330     4       4
> > >>>    1         |  396         q     = rec.autos
> > >>> 0       0       1       0       1       0       4       12      3
> > >>> 1       3       0       0       0       0       5       6       359     1
> > >>>    1         |  397         r     = rec.sport.baseball
> > >>> 0       1       0       0       0       1       0       0       3
> > >>> 3       0       0       3       2       1       6       1       6       366
> > >>>    3         |  396         s     = sci.crypt
> > >>> 0       2       11      1       1       0       40      0       1
> > >>> 2       3       4       2       1       0       5       0       2       2
> > >>>    321       |  398         t     = soc.religion.christian
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa                                       0.7073
> > >>> Accuracy                                        75%
> > >>> Reliability                                70.6238%
> > >>> Reliability (standard deviation)            0.2187
> > >>> Log-likelihood                mean      :    -1.1182
> > >>>                               25%-ile   :    -1.6911
> > >>>                               75%-ile   :    -0.0803
> > >>>
> > >>> Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 10783 ms (Minutes: 0.17971666666666666)
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <su...@yahoo.com>wrote:
> > >>>
> > >>>> Thanks Andrew for reporting that. I rolled back the release to fix this
> > >>>> and few other issues.
> > >>>>
> > >>>> We have removed asf-examples*.sh from trunk as the sample file at the
> > >>>> url mentioned in ur email is not available.
> > >>>> This is something we need to fix and restore in 1.0.
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <
> > >>>> ap.dev@outlook.com> wrote:
> > >>>>
> > >>>> from the asf-email-examples.sh script:
> > >>>>
> > >>>> # You will need to download or otherwise obtain some or all of the
> > >>>> Amazon ASF Em
> > >>>> ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566)
> > >>>> to use this
> > >>>> script.
> > >>>> # To obtain a full copy you will need to launch an EC2 instance and
> > >>>> mount the da
> > >>>> taset to download it, otherwise you can get a sample of it at
> > >>>> #
> > >>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
> > >>>>
> > >>>> It looks like the:
> > >>>>
> > >>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
> > >>>>
> > >>>> link is down.
> > >>>>
> > >>>> Is there somewhere else that we can get a subset of the ASF emails?
> > >>>>
> > >>>>
> > >>>>
> > >>>> Date: Tue, 21 Jan 2014 09:48:06 -0800
> > >>>> > Subject: Re: MAHOUT 0.9 Release - New URL
> > >>>> > From: andrew.musselman@gmail.com
> > >>>> > To: dev@mahout.apache.org
> > >>>> >
> > >>>> > Sure thing; continuing to smoke test the other examples tonight
> > >>>> >
> > >>>> >
> > >>>> > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <
> > >>>> suneel_marthi@yahoo.com>wrote:
> > >>>> >
> > >>>> > > Thanks Andrew M., see that some of the example scripts need to be
> > >>>> fixed as
> > >>>> > > they still refer to the deprecated algorithms.
> > >>>> > > See that the Streaming KMeans has failed for you as well.
> > >>>> > >
> > >>>> > > I'll be rolling back the release today to fix these issues.
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
> > >>>> > > andrew.musselman@gmail.com> wrote:
> > >>>> > >
> > >>>> > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default
> > >>>> 64-bit
> > >>>> > > Linux AMI from tarball.
> > >>>> > >
> > >>>> > > All tests pass.
> > >>>> > >
> > >>>> > > *Output of examples:*
> > >>>> > > *asf-email-examples.sh, run on mahout.apache.org
> > >>>> > > <http://mahout.apache.org>:*
> > >>>> > > *recommendations:*
> > >>>> > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
> > >>>> > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000  | less
> > >>>> > > 1
> > >>>> > >
> > >>>> > >
> > >>>> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
> > >>>> > > 4
> > >>>> > >
> > >>>> > >
> > >>>> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
> > >>>> > > 6
> > >>>> > >
> > >>>> > >
> > >>>> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
> > >>>> > > 8
> > >>>> > >     [12758:1.0,19409:1.0,11112:1.0]
> > >>>> > > 11
> > >>>> > >
> > >>>> > >
> > >>>> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
> > >>>> > > 14
> > >>>> > >
> > >>>> > >
> > >>>> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
> > >>>> > > 15
> > >>>> > >
> > >>>> > >
> > >>>> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
> > >>>> > > 16
> > >>>> > >
> > >>>> > >
> > >>>> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
> > >>>> > > 18
> > >>>> > >
> > >>>> > >
> > >>>> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
> > >>>> > > 19      [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
> > >>>> > > 20
> > >>>> > >
> > >>>> > >
> > >>>> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
> > >>>> > > [snip]
> > >>>> > >
> > >>>> > > *clustering; kmeans:*
> > >>>> > > [snip]
> > >>>> > >         Weight : [props - optional]:  Point:
> > >>>> > >         1.0 :
> > >>>> > >  [distance-squared=1.0193102046188427]:
> > >>>> > > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus=
> > >>>> > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
> > >>>> 7573:0.204,
> > >>>> > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093,
> > >>>> 9779:0.159,
> > >>>> > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
> > >>>> > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
> > >>>> > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
> > >>>> > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
> > >>>> > > 39789:0.110, 40743:0.190, 45775:0.086]
> > >>>> > >         1.0 : [distance-squared=0.9823018320457279]:
> > >>>> > > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus=
> > >>>> > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
> > >>>> 5336:0.106,
> > >>>> > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173,
> > >>>> 7832:0.072,
> > >>>> > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
> > >>>> > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
> > >>>> > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
> > >>>> > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
> > >>>> > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
> > >>>> > >         1.0 : [distance-squared=0.9509142993214911]:
> > >>>> > >
> > >>>> /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
> > >>>> > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
> > >>>> > >  4419:0.076,
> > >>>> > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056,
> > >>>> 7235:0.048,
> > >>>> > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123,
> > >>>> 7683:0.077,
> > >>>> > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
> > >>>> 10225:0.081,
> > >>>> > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
> > >>>> > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
> > >>>> > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
> > >>>> > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
> > >>>> > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
> > >>>> > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
> > >>>> > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
> > >>>> > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
> > >>>> > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
> > >>>> > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
> > >>>> > > 41280:0.065, 41696:0.072, 41947:0.118,
> > >>>> > >  43685:0.086, 44077:0.308,
> > >>>> > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
> > >>>> > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
> > >>>> > > [snip]
> > >>>> > >
> > >>>> > > *clustering; dirichlet:*
> > >>>> > > Get this complaint:
> > >>>> > > Running Dirichlet with K = 8
> > >>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > >>>> > > HADOOP_CONF_DIR=
> > >>>> > > MAHOUT-JOB:
> > >>>> > >
> > >>>> > >
> > >>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > >>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class:
> > >>>> dirichlet
> > >>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props
> > >>>> found on
> > >>>> > > classpath, will use command-line arguments only
> > >>>> > > Unknown program 'dirichlet' chosen.
> > >>>> > >
> > >>>> > > *clustering: minhash:*
> > >>>> > > Running Minhash
> > >>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > >>>> > > HADOOP_CONF_DIR=
> > >>>> > > MAHOUT-JOB:
> > >>>> > >
> > >>>> > >
> > >>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > >>>> > > 14/01/21 05:17:27 WARN
> > >>>> > >  driver.MahoutDriver: Unable to add class: minhash
> > >>>> > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found
> > >>>> on
> > >>>> > > classpath, will use command-line arguments only
> > >>>> > > Unknown program 'minhash' chosen.
> > >>>> > >
> > >>>> > > *classification; standard:*
> > >>>> > > =======================================================
> > >>>> > > Summary
> > >>>> > > -------------------------------------------------------
> > >>>> > > Correctly Classified Instances          :       5384       87.7874%
> > >>>> > > Incorrectly Classified Instances        :        749       12.2126%
> > >>>> > > Total Classified Instances              :       6133
> > >>>> > >
> > >>>> > > =======================================================
> > >>>> > > Confusion Matrix
> > >>>> > > -------------------------------------------------------
> > >>>> > > a       b       c       d
> > >>>> > >     <--Classified as
> > >>>> > > 2949    7       531     25       |  3512        a     = dev
> > >>>> > > 0       0       0       0        |  0           b     = general
> > >>>> > > 99      8       1763    8        |  1878        c     = user
> > >>>> > > 41      1       29      672      |  743         d     = commits
> > >>>> > >
> > >>>> > > =======================================================
> > >>>> > > Statistics
> > >>>> > > -------------------------------------------------------
> > >>>> > > Kappa
> > >>>> > >  0.7877
> > >>>> > > Accuracy                                   87.7874%
> > >>>> > > Reliability                                 53.658%
> > >>>> > > Reliability (standard deviation)            0.4911
> > >>>> > >
> > >>>> > > *classification; complementary:*
> > >>>> > > =======================================================
> > >>>> > > Summary
> > >>>> > > -------------------------------------------------------
> > >>>> > > Correctly Classified Instances          :       5530       90.1679%
> > >>>> > > Incorrectly Classified Instances        :        603        9.8321%
> > >>>> > > Total Classified Instances              :
> > >>>> > >  6133
> > >>>> > >
> > >>>> > > =======================================================
> > >>>> > > Confusion Matrix
> > >>>> > > -------------------------------------------------------
> > >>>> > > a       b       c       d       <--Classified as
> > >>>> > > 3168    0       276     68       |  3512        a     = dev
> > >>>> > > 0       0       0       0        |  0           b     = general
> > >>>> > > 196     0       1652    30       |  1878        c     = user
> > >>>> > > 25      0       8       710      |  743         d     =
> > >>>> > >  commits
> > >>>> > >
> > >>>> > > =======================================================
> > >>>> > > Statistics
> > >>>> > > -------------------------------------------------------
> > >>>> > > Kappa                                       0.8259
> > >>>> > > Accuracy                                   90.1679%
> > >>>> > > Reliability                                54.7459%
> > >>>> > > Reliability (standard deviation)            0.5005
> > >>>> > >
> > >>>> > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms
> > >>>> (Minutes:
> > >>>> > > 0.34836666666666666)
> > >>>> > >
> > >>>> > > *classification; sgd, with three categories:*
> > >>>> > > Running SGD Training
> > >>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
> > >>>> > >  and
> > >>>> > > HADOOP_CONF_DIR=
> > >>>> > > MAHOUT-JOB:
> > >>>> > >
> > >>>> > >
> > >>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > >>>> > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
> > >>>> > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
> > >>>> classpath,
> > >>>> > > will use command-line arguments only
> > >>>> > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
> > >>>> > > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
> > >>>> > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
> > >>>> > > --output=[asf-output/classification/sgd/models], --poolSize=[5],
> > >>>> > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
> > >>>> > > 24168 training files
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       1
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
> > >>>> > >  2
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       3
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       4
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       6
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       8
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       10
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00
> > >>>> > >    0.00    0.00    0.0000000       0.0000000       12
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       15
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       20
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       25
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       30
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000
> > >>>> > >     0.0000000       40
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       50
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       60
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       70
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       80
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       100
> > >>>> > > 0.000
> > >>>> > >  0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       120
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       140
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       150
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       200
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       250
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00
> > >>>> > >  0.00    0.00    0.0000000       0.0000000       300
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       400
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       500
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       600
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       700
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000
> > >>>> > >  0.0000000       800
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > >>>> > > 1.0019413e-08   1000    -0.607  75.78   none
> > >>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > >>>> > > 1.0019413e-08   1200    -0.607  75.78   none
> > >>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > >>>> > > 1.0019413e-08   1400    -0.607  75.78   none
> > >>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > >>>> > > 1.0019413e-08   1500    -0.607  75.78   none
> > >>>> > > 0.24    43686.00        17924.00        329.50
> > >>>> > >  1.0571799e-08
> > >>>> > > 1.0032261e-08   2000    -0.487  82.65   none
> > >>>> > > 0.24    49753.00        21610.00        330.71  1.3770070e-08
> > >>>> > > 1.0011902e-08   2500    -0.439  83.90   none
> > >>>> > > 0.24    49753.00        21610.00        330.71  1.3770070e-08
> > >>>> > > 1.0011902e-08   3000    -0.439  83.90   none
> > >>>> > > 0.32    50635.00        28531.00        437.09  1.0551175e-08
> > >>>> > > 1.0000001e-08   4000    -0.351  88.14   none
> > >>>> > > 0.32    50635.00        32642.00        437.09  1.0551175e-08
> > >>>> > > 1.0000000e-08   5000    -0.378  87.10   none
> > >>>> > > 0.32    50635.00        36461.00        437.09
> > >>>> > >  1.0556652e-08
> > >>>> > > 1.0000001e-08   6000    -0.372  86.89   none
> > >>>> > > 0.32    50635.00        37768.00        437.09  1.0576742e-08
> > >>>> > > 1.0000001e-08   7000    -0.334  89.26   none
> > >>>> > > 0.32    50635.00        38807.00        437.09  1.0576742e-08
> > >>>> > > 1.0000000e-08   8000    -0.368  87.52   none
> > >>>> > > 0.32    50635.00        44731.00        437.09  1.0576716e-08
> > >>>> > > 1.0000000e-08   10000   -0.374  87.39   none
> > >>>> > > 0.32    50635.00        45672.00        437.09  1.0576716e-08
> > >>>> > > 1.0000000e-08   12000   -0.298  88.26   none
> > >>>> > > Exception in thread "main" java.lang.IllegalStateException:
> > >>>> > > java.lang.ArrayIndexOutOfBoundsException:
> > >>>> > >  2
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
> > >>>> > >         at
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
> > >>>> > >         at
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
> > >>>> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > >>>> Method)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > >>>> > >
> > >>>> > >  at
> > >>>> > >
> > >>>> > >
> > >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > >>>> > >         at java.lang.reflect.Method.invoke(Method.java:622)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > >>>> > >         at
> > >>>> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > >>>> > >         at
> > >>>> > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> > >>>> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > >>>> Method)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > >>>> > >         at java.lang.reflect.Method.invoke(Method.java:622)
> > >>>> > >         at
> > >>>> > >  org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> > >>>> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
> > >>>> > >         at
> > >>>> > > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
> > >>>> > >
> > >>>> > >  at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
> > >>>> > >         at
> > >>>> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > >>>> > >         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >>>> > >         at java.lang.Thread.run(Thread.java:701)
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
> > >>>> > > andrew.musselman@gmail.com> wrote:
> > >>>> > >
> > >>>> > > > Trying out the build today
> > >>>> > > >
> > >>>> > > >
> > >>>> > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
> > >>>> suneel_marthi@yahoo.com
> > >>>> > > >wrote:
> > >>>> > > >
> > >>>> > > >> This is an issue (trivial one though) that needs to be fixed for
> > >>>> 0.9
> > >>>> > > >> Release, will be rerolling the release today (in the next few
> > >>>> hrs) and
> > >>>> > > >> putting out a new release candidate in staging.
> > >>>> > > >>
> > >>>> > > >> Thanks for reporting this Andrew P.
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
> > >>>> > > ap.dev@outlook.com>
> > >>>> > > >> wrote:
> > >>>> > > >>
> > >>>> > > >> I ran through the tests with on a CentOS VM
> > >>>> > >  AMD64 2 cores 4 GB RAM.  Had
> > >>>> > > >> a bit of trouble getting the Hadoop natives to compile and
> > >>>> therefore may
> > >>>> > > >> have run into some problems because of the hadoop setup.  Ran
> > >>>> into some
> > >>>> > > >> problems in the example scripts.  Particularly with
> > >>>> > > >> ./cluster-syntheticcontrol.sh ->4,5.  I will run through the
> > >>>> rest of the
> > >>>> > > >> examples when im sure I've got hadoop setup right.
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> Apache Maven 3.1.2-SNAPSHOT
> > >>>> > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> > >>>> > > >> Java home: /usr/java/jdk1.6.0_45/jre
> > >>>> > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch:
> > >>>> "amd64",
> > >>>> > > >> family: "unix"
> > >>>> > > >> $MAHOUT_LOCAL=true
> > >>>> > > >> Hadoop 2.2.0
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> a) Verify that u can unpack the release (tar or zip) ...passed
> > >>>> (tar)
> > >>>> > > >> [passed ]
> > >>>> > > >>
> > >>>> > > >> b) Verify u r able to compile the
> > >>>> > >  distro
> > >>>> > > >>
> > >>>> > > >>     mvn compile- [passed with warnings]
> > >>>> > > >>
> > >>>> > > >>     [WARNING]  Expected all dependencies to require Scala
> > >>>> version: 2.9.3
> > >>>> > > >>     [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires
> > >>>> scala
> > >>>> > > >> version: 2.9.3
> > >>>> > > >>     [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala
> > >>>> > > >> version: 2.9.2
> > >>>> > > >>     [WARNING] Multiple versions of scala libraries detected!
> > >>>> > > >>
> > >>>> > > >> c)  Run through the unit tests: mvn clean test
> > >>>> > > >>     mvn clean test [passed]
> > >>>> > > >>
> > >>>> > > >> d) Run the
> > >>>> > > >>  example scripts under $MAHOUT_HOME/examples/bin.
> > >>>> > > >> Please run through all the different options in each script
> > >>>> > > >>
> > >>>> > > >>     Running example scripts with $MAHOUT_LOCAL=true
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > >  ./cluster-syntheticcontrol.sh ->1 [works]
> > >>>> > > >>     ./cluster-syntheticcontrol.sh ->2 [works]
> > >>>> > > >>     ./cluster-syntheticcontrol.sh ->3 [works]
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>     ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
> > >>>> > > >>     [...]
> > >>>> > > >>     WARNING: Unable to add class:
> > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > >>>> > > >>     java.lang.ClassNotFoundException:
> > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > >>>> > > >>         at
> > >>>> > > >>  java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > >>>> > > >>         at java.security.AccessController.doPrivileged(Native
> > >>>> Method)
> > >>>> > > >>         at
> > >>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > >>>> > > >>         at
> > >>>> > >  java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > >>>> > > >>         at
> > >>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > >>>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > >>>> > > >>         at java.lang.Class.forName0(Native Method)
> > >>>> > > >>         at java.lang.Class.forName(Class.java:171)
> > >>>> > > >>         at
> > >>>> > > >>
> > >>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > >>>> > > >>         at
> > >>>> > > >>
> > >>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > >>>> > > >>     Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>     ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
> > >>>> > > >>
> > >>>> > > >>     WARNING: Unable to add class:
> > >>>> > > >>
> > >>>> > >  org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > >>>> > > >>     java.lang.ClassNotFoundException:
> > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > >>>> > > >>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > >>>> > > >>         at java.security.AccessController.doPrivileged(Native
> > >>>> Method)
> > >>>> > > >>         at
> > >>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > >>>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > >>>> > > >>         at
> > >>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > >>>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > >>>> > > >>         at java.lang.Class.forName0(Native Method)
> > >>>> > > >>         at
> > >>>> > >  java.lang.Class.forName(Class.java:171)
> > >>>> > > >>         at
> > >>>> > > >>
> > >>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > >>>> > > >>         at
> > >>>> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > >>>> > > >>     Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
> > >>>> > > >>     WARNING: No
> > >>>> > > >>
> > >>>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found
> > >>>> > > on
> > >>>> > > >> classpath, will use command-line arguments only
> > >>>> > > >>     Unknown program
> > >>>> > > >>  'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
> > >>>> chosen.
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>     ./classify-20newsgroups.sh ->1 [works]
> > >>>> > > >>     ./classify-20newsgroups.sh ->2 [works]
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>     cluster-reuters.sh ->1 [works]
> > >>>> > > >>
> > >>>> > >  cluster-reuters.sh ->2 [works]
> > >>>> > > >>     cluster-reuters.sh ->3 [works]
> > >>>> > > >>
> > >>>> > > >>     Same error as noted previosly in the thread:
> > >>>> > > >>
> > >>>> > > >>     cluster-reuters.sh ->4 [0 clusters]
> > >>>> > > >>
> > >>>> > > >>     [...]
> > >>>> > > >>
> > >>>> > > >>     WARNING: No qualcluster.props found on classpath, will use
> > >>>> > > >> command-line arguments only
> > >>>> > > >>     Num clusters: 0; maxDistance: 0.000000
> > >>>> > > >>     [Dunn Index]
> > >>>> > > >>  First: Infinity
> > >>>> > > >>     [Davies-Bouldin Index] First: NaN
> > >>>> > > >>     Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>>> > > >>     INFO: Program took 669 ms (Minutes: 0.01115)
> > >>>> > > >>     cluster,distance.mean,distance.sd
> > >>>> > > >>
> > >>>> > >
> > >>>> > >
> > >>>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
> > >>>> > > >> > From: suneel_marthi@yahoo.com
> > >>>> > > >> > Subject: MAHOUT 0.9 Release - New URL
> > >>>> > > >> > To: user@mahout.apache.org; dev@mahout.apache.org
> > >>>> > > >> >
> > >>>> > > >> > Third time's a Charm!!!
> > >>>> > > >> >
> > >>>> > > >> >
> > >>>> > > >> > Here's the new URL for Mahout 0.9 Release:
> > >>>> > > >> >
> > >>>> > > >>
> > >>>> > >
> > >>>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> > >>>> > > >> >
> > >>>> > > >> > For those volunteering to test this, some of the things to be
> > >>>> > > verified:
> > >>>> > > >> >
> > >>>> > > >> > a) Verify that u can unpack the release (tar or zip)
> > >>>> > > >> > b) Verify u r able to compile the distro
> > >>>> > > >> > c)  Run through the unit tests: mvn clean test
> > >>>> > > >> > d) Run the example scripts
> > >>>> > > >>  under $MAHOUT_HOME/examples/bin. Please run through all the
> > >>>> different
> > >>>> > > >> options in each script.
> > >>>> > > >> >
> > >>>> > > >> >
> > >>>> > > >> > Committers
> > >>>> > > >> >  and PMC members:
> > >>>> > > >> > ---------------------------------------
> > >>>> > > >> >
> > >>>> > > >> > Need 'at least 3 +1 votes' for the Release to pass.
> > >>>> > > >> >
> > >>>> > > >> >
> > >>>> > > >> > Thanks and
> > >>>> > >  Regards.
> > >>>> > > >>
> > >>>> > > >
> > >>>> > > >
> > >>>> > >
> > >>>>
> > >>>
> > >>>
> > >>
> > >
> 

Re: MAHOUT 0.9 Release - New URL

Posted by Suneel Marthi <su...@yahoo.com>.
Thanks Andrew. I'll put a Release out soon. 




On Wednesday, January 22, 2014 3:52 PM, Andrew Palumbo <ap...@outlook.com> wrote:
 
 
Everything seems to run well on my local machine:

Checked out revision 1560364.

CentOS 6
Apache Maven 3.1.2-SNAPSHOT 
Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
Java home: /usr/java/jdk1.6.0_45/jre
OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64", family: "unix"
Hadoop 2.2.0


mvn clean compile -DSkipTests [OK-Several Warnings]
mvn clean test [PASSED ALL]
mvn clean install -DskipTests [OK]


$MAHOUT_LOCAL=true 

classify-20newsgroups.sh->1 [Accuracy    89.3529%]
classify-20newsgroups.sh->2 [Accuracy    90.8317%]
classify-20newsgroups.sh->3 [Accuracy    76.2746%]
classify-20newsgroups.sh->4 [cleans up] 

cluster-reuters.sh->1 [20 clusters]  -kmeans
cluster-reuters.sh->2 [INFO: 20 clusters]  -fkmeans
cluster-reuters.sh->3 [OK]  -lda
cluster-reuters.sh->4 [10 (9) clusters- see attached]  -streaming kmeans

./cluster-syntheticcontrol.sh->1 [INFO: Wrote 6 clusters]
./cluster-syntheticcontrol.sh->2 [INFO: Wrote 6 clusters]
./cluster-syntheticcontrol.sh->3 [INFO: Wrote 6 clusters]

./factorize-movielens-1M.sh /home/andy/test_data/ml-1m/ratings.dat [RMSE is: 0.851264570339848]




Attached is full output of cluster-reuters.sh->4 Streaming K-Means.



From cluster-reuters.sh->4 Streaming K-Means:

Cluster 0 is has 1 data point. Need atleast 2 data points in a cluster for OnlineSummarizer.
Average distance in cluster 1 [2816]: 3438.913758
Average distance in cluster 2 [112]: 20617.345993
Average distance in cluster 3 [4]: 32504.085379
Average distance in cluster 4 [435]: 18476.579935
Average distance in cluster 5 [27]: 21153.167574
Average distance in cluster 6 [15480]: 2040.864416
Average distance in cluster 7 [1711]: 5281.742482
Average distance in cluster 8 [964]: 15762.976239
Average distance in cluster 9 [28]: 19762.109632
Num clusters: 10; maxDistance: 107106.379648




[Dunn Index] First: 0.002272
[Davies-Bouldin Index] First: 57.871266
Jan 22, 2014 12:14:47 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 77695 ms (Minutes: 1.2949166666666667)
cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
1,3438.913758,2430.072640,250.635051,1793.254765,2908.356638,4444.702564,22173.892767,2816,train
2,20617.345993,3978.577827,-8306.835555,17787.685767,19584.319120,22864.637511,37305.829397,112,train
3,32504.085379,29250.558538,-12174.296092,12174.296092,36522.888276,26372.137172,107106.379648,4,train
4,18476.579935,3600.742072,-7212.729374,15841.995992,17431.838259,20066.610494,40205.090209,435,train
5,21153.167574,4963.661797,-8880.583978,19729.348269,21251.400944,24588.743549,27926.248558,27,train
6,2040.864416,2007.719699,53.622493,841.033934,1571.121917,2396.407672,18967.768820,15480,train
7,5281.742482,3083.071478,1933.759989,3216.929268,4074.689928,6371.577109,20292.193673,1711,train
8,15762.976239,3158.956443,65.031208,13511.867700,14744.029626,17287.006957,31483.809655,964,train
9,19762.109632,4355.120345,-8902.814641,18669.317253,20712.227220,21602.660490,27452.910312,28,train





> From: ap.dev@outlook.com
> To: dev@mahout.apache.org; user@mahout.apache.org
> Subject: RE: MAHOUT 0.9 Release - New URL
> Date: Wed, 22 Jan 2014 09:37:06 -0500
> 
> will do!
> 
> > Date: Wed, 22 Jan 2014 01:24:05 -0800
> > From: suneel_marthi@yahoo.com
> > Subject: Re: MAHOUT 0.9 Release - New URL
> > To: dev@mahout.apache.org; user@mahout.apache.org
> > 
> > Andrew M., Andrew P. and others,
> > 
> > Sebastian and me fixed a few issues today (for 0.9):
> > 
> > a) Removed asf-email-examples.sh script and few other scripts that should have been removed. Also removed references/invocations to algorithms that have been removed from the codebase.
> > b) Fixed the issue with Streaming Kmeans clustering and checked in the code. 
> > c) Resurrected Frequent Pattern Mining implementation for 0.9.
> > 
> > Please checkout the latest code from trunk, run a build locally and run thru the example scripts. 
> > 
> > Thanks and Regards.
> > 
> > 
> > 
> > 
> > 
> > 
> > On Wednesday, January 22, 2014 12:11 AM, Andrew Musselman <an...@gmail.com> wrote:
> > 
> > *factorize-movielens-1M.sh:*
> > RMSE is:
> > 
> > 0.8519064098265133
> > 
> > 
> > Sample recommendations:
> > 
> > 2229
> > [2197:4.4961276,527:4.4309845,2972:4.4180074,318:4.379484,572:4.312696,3092:4.249903]
> > 5848
> > [1900:4.6775646,3787:4.6623707,632:4.641377,2609:4.608225,3808:4.6058936,2998:4.6057487]
> > 3728
> > [572:4.951382,47:4.748921,874:4.6945343,1563:4.679901,3314:4.6621537,50:4.655838]
> > 1252
> > [53:5.0,3816:4.9664702,3077:4.9494777,213:4.94007,3808:4.9060082,978:4.8568053]
> > 634
> > [572:5.0,3092:4.779557,1872:4.72024,2687:4.629712,2125:4.615142,3853:4.5261393]
> > 5516    [572:5.0,2197:5.0,3092:5.0,318:4.908213,356:4.885,3844:4.8237453]
> > 2276    [1204:5.0,572:5.0,912:5.0,1250:5.0,1272:4.999891,1262:4.989652]
> > 4219
> > [53:4.8112006,598:4.775032,858:4.761604,572:4.7579737,1219:4.680987,1221:4.6604886]
> > 91      [1198:5.0,2762:5.0,1207:5.0,1234:5.0,318:5.0,260:5.0]
> > 502
> > [953:5.0,260:4.9800477,1234:4.869403,1198:4.8527064,1207:4.8497486,3469:4.847286]
> > 
> > factorize-netflix.sh:
> > References a no-longer-available data set that Netflix took down after the
> > competition; should at least mention that the data set is no longer
> > "online" at least.
> > 
> > 
> > On Tue, Jan 21, 2014 at 8:05 PM, Andrew Musselman <
> > andrew.musselman@gmail.com> wrote:
> > 
> > > *clustering-syntheticcontrol.sh*
> > >
> > > *Canopy:*
> > > [snip]
> > >         1.0 : [distance-squared=1740.681000315628]: [35.486, 25.600,
> > > 29.914, 30.200, 27.654, 28.647, 26.582, 32.941, 34.728, 29.047, 34.706,
> > > 34.764, 28.816, 30.271, 25.784, 25.035, 35.436, 29.075, 24.267, 24.625,
> > > 28.162, 26.218, 28.773, 28.966, 28.802, 34.292, 26.211, 33.363, 32.920,
> > > 31.891, 34.504, 32.686, 24.327, 35.981, 31.390, 10.832, 20.238, 10.051,
> > > 14.877, 10.570, 19.603, 14.544, 10.667, 16.470, 19.007, 10.352, 13.473,
> > > 12.196, 10.684, 16.620, 20.434, 17.069, 18.744, 9.599, 11.195, 12.002,
> > > 10.017, 17.149, 14.850, 10.890]
> > >         1.0 : [distance-squared=1455.363773097357]: [31.022, 28.140,
> > > 26.730, 26.570, 29.561, 26.966, 28.049, 25.673, 33.721, 26.275, 30.410,
> > > 31.101, 24.019, 35.659, 25.253, 25.932, 28.618, 32.423, 33.666, 33.745,
> > > 35.118, 29.164, 25.477, 31.947, 35.491, 30.730, 25.820, 24.651, 25.528,
> > > 31.343, 29.005, 31.825, 26.891, 28.194, 31.429, 16.935, 8.070, 16.604,
> > > 14.743, 10.342, 8.155, 10.395, 17.689, 16.791, 14.138, 15.761, 6.787,
> > > 13.062, 16.660, 15.021, 9.891, 9.216, 11.550, 8.877, 18.220, 9.477, 10.342,
> > > 16.430, 11.898, 15.366]
> > >         1.0 : [distance-squared=1679.9304895378882]: [29.625, 25.503,
> > > 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> > > 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> > > 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> > > 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> > > 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> > > 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> > > 15.285, 22.528, 20.657, 24.129]
> > >         1.0 : [distance-squared=2044.2887801683828]: [27.414, 25.397,
> > > 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> > > 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> > > 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> > > 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> > > 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> > > 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> > > 20.229, 11.131, 9.980, 10.720]
> > >         1.0 : [distance-squared=1385.3154063160764]: [35.899, 26.672,
> > > 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> > > 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> > > 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> > > 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> > > 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> > > 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> > > 16.546, 15.927, 18.084, 17.475]
> > >         1.0 : [distance-squared=1920.6376615603585]: [24.538, 24.280,
> > > 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> > > 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> > > 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> > > 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> > > 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> > > 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> > > 19.310, 12.999, 17.460]
> > >         1.0 : [distance-squared=2192.939571172661]: [34.335, 30.938,
> > > 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> > > 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> > > 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> > > 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> > > 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> > > 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> > > 11.743, 11.699, 10.152]
> > > Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Wrote 6 clusters
> > > Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Program took 5827 ms (Minutes: 0.09711666666666667)
> > >
> > > *K-means:*
> > > [snip]
> > >         1.0 : [distance-squared=2873.881301031739]: [26.369, 37.791,
> > > 41.839, 39.694, 36.728, 35.079, 30.668, 24.755, 20.610, 18.885, 15.459,
> > > 14.074, 19.117, 34.230, 32.202, 40.715, 39.543, 37.193, 39.448, 30.829,
> > > 25.273, 22.324, 19.801, 13.505, 17.462, 24.287, 27.398, 38.577, 42.108,
> > > 41.947, 43.987, 41.331, 28.958, 23.664, 20.308, 22.244, 12.149, 15.768,
> > > 18.964, 32.579, 33.202, 36.205, 42.364, 40.601, 35.415, 39.576, 33.145,
> > > 19.936, 23.062, 19.053, 24.383, 19.611, 25.218, 38.860, 36.570, 38.964,
> > > 36.946, 36.900, 32.593, 31.563]
> > >         1.0 : [distance-squared=2525.8924241648783]: [35.389, 31.178,
> > > 40.041, 43.034, 49.524, 40.942, 42.369, 30.153, 26.253, 23.178, 19.885,
> > > 19.693, 21.837, 26.858, 23.533, 29.798, 43.401, 43.349, 48.238, 43.868,
> > > 44.017, 28.056, 25.284, 22.737, 13.703, 14.023, 22.115, 23.720, 25.478,
> > > 37.870, 37.868, 46.954, 43.135, 41.286, 37.269, 29.347, 24.312, 21.743,
> > > 19.152, 21.668, 10.882, 16.147, 30.020, 28.472, 38.716, 44.620, 47.847,
> > > 48.846, 39.361, 38.449, 33.105, 19.935, 14.961, 12.149, 12.630, 13.459,
> > > 18.967, 35.473, 30.146, 45.527]
> > >         1.0 : [distance-squared=2392.7171990886272]: [27.662, 37.199,
> > > 39.158, 44.264, 46.473, 40.178, 38.728, 24.412, 25.155, 15.938, 13.125,
> > > 16.867, 13.875, 29.969, 34.322, 40.870, 44.225, 47.594, 47.607, 44.751,
> > > 39.813, 32.461, 16.646, 16.305, 21.256, 20.627, 19.267, 31.901, 34.995,
> > > 43.643, 43.152, 47.125, 48.524, 33.131, 32.697, 20.300, 14.350, 18.330,
> > > 12.155, 11.261, 19.701, 29.313, 30.457, 39.914, 49.543, 49.851, 46.445,
> > > 39.561, 28.860, 22.486, 13.967, 10.006, 11.715, 15.945, 21.348, 25.439,
> > > 41.024, 37.105, 45.623, 43.589]
> > >         1.0 : [distance-squared=1419.8378244373016]: [25.784, 34.129,
> > > 42.659, 37.176, 35.961, 34.307, 32.108, 29.749, 25.047, 17.455, 24.652,
> > > 25.311, 22.995, 30.256, 25.955, 28.426, 34.556, 40.386, 39.642, 40.566,
> > > 32.612, 34.091, 26.033, 25.668, 26.545, 17.338, 24.980, 20.134, 27.542,
> > > 36.612, 31.855, 37.947, 39.736, 33.535, 36.607, 37.479, 32.612, 22.262,
> > > 20.662, 16.124, 24.547, 27.686, 21.747, 27.198, 31.259, 40.569, 37.067,
> > > 34.465, 34.730, 33.371, 23.060, 30.162, 22.022, 22.216, 14.812, 19.357,
> > > 24.508, 34.432, 32.155, 34.839]
> > >         1.0 : [distance-squared=4186.814512311335]: [25.870, 39.195,
> > > 36.908, 47.052, 47.384, 40.741, 42.494, 30.282, 25.834, 17.650, 16.004,
> > > 17.895, 13.321, 19.045, 27.440, 31.911, 39.208, 43.622, 41.567, 44.815,
> > > 44.921, 35.422, 35.477, 23.190, 17.859, 14.684, 23.504, 23.141, 21.746,
> > > 30.816, 31.361, 37.015, 38.094, 46.688, 47.681, 43.777, 39.652, 31.701,
> > > 23.767, 22.265, 22.654, 22.327, 19.195, 21.163, 29.602, 27.563, 36.244,
> > > 38.859, 44.234, 42.352, 42.160, 40.172, 30.094, 21.092, 25.193, 13.096,
> > > 18.111, 14.754, 27.386, 27.026]
> > >         1.0 : [distance-squared=1544.4011543572997]: [28.075, 41.784,
> > > 42.120, 38.735, 44.320, 34.316, 32.212, 31.868, 24.301, 14.547, 17.178,
> > > 22.279, 24.357, 31.011, 31.444, 34.837, 46.550, 48.301, 38.859, 42.363,
> > > 35.657, 31.499, 19.794, 12.124, 15.371, 12.436, 15.763, 24.679, 32.597,
> > > 43.004, 36.616, 38.935, 42.954, 34.957, 36.183, 28.177, 16.326, 14.988,
> > > 10.680, 22.728, 24.075, 24.058, 36.616, 43.982, 39.198, 40.118, 40.078,
> > > 34.752, 34.018, 23.750, 18.374, 12.251, 15.539, 18.699, 28.973, 24.044,
> > > 39.404, 38.034, 46.458, 44.432]
> > >         1.0 : [distance-squared=825.9338725427806]: [33.670, 38.675,
> > > 39.742, 41.989, 37.291, 43.975, 31.909, 25.878, 31.080, 15.858, 13.950,
> > > 23.097, 19.983, 21.692, 31.579, 38.570, 33.376, 38.843, 41.936, 33.534,
> > > 39.195, 32.897, 25.343, 18.523, 15.089, 17.771, 22.614, 25.313, 23.687,
> > > 29.010, 41.995, 35.712, 40.872, 41.669, 32.156, 25.162, 24.980, 23.705,
> > > 18.413, 20.975, 14.906, 26.171, 30.165, 27.818, 35.083, 39.514, 37.851,
> > > 33.967, 32.338, 34.977, 26.589, 28.079, 19.597, 24.669, 23.098, 25.685,
> > > 28.215, 34.940, 36.910, 39.749]
> > > Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Wrote 6 clusters
> > > Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Program took 16902 ms (Minutes: 0.2817)
> > >
> > > *Fuzzy k-means:*
> > > [snip]
> > >         1.0 : [distance-squared=971.7369782121968]: [29.625, 25.503,
> > > 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> > > 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> > > 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> > > 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> > > 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> > > 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> > > 15.285, 22.528, 20.657, 24.129]
> > >         1.0 : [distance-squared=2054.618163154475]: [27.414, 25.397,
> > > 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> > > 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> > > 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> > > 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> > > 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> > > 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> > > 20.229, 11.131, 9.980, 10.720]
> > >         1.0 : [distance-squared=954.6503560728597]: [35.899, 26.672,
> > > 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> > > 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> > > 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> > > 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> > > 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> > > 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> > > 16.546, 15.927, 18.084, 17.475]
> > >         1.0 : [distance-squared=2817.9170498632957]: [24.538, 24.280,
> > > 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> > > 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> > > 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> > > 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> > > 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> > > 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> > > 19.310, 12.999, 17.460]
> > >         1.0 : [distance-squared=3472.3684696871424]: [34.335, 30.938,
> > > 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> > > 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> > > 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> > > 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> > > 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> > > 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> > > 11.743, 11.699, 10.152]
> > > Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Wrote 6 clusters
> > > Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Program took 106615 ms (Minutes: 1.7769166666666667)
> > >
> > > *Dirichlet and Meanshift:*
> > > Already detailed in M-1400, deprecated jobs still referenced.
> > >
> > >
> > >
> > > On Tue, Jan 21, 2014 at 6:20 PM, Andrew Musselman <
> > > andrew.musselman@gmail.com> wrote:
> > >
> > >> *cluster-reuters.sh*
> > >> *k-means:*
> > >>
> > >> [snip]
> > >> :VL-19482{n=913 c=[0.06:0.011, 0.1:0.007, 0.13:0.010, 0.25:0.016,
> > >> 0.38:0.020, 0.4:0.007, 0.5:0.032, 0
> > >>         Top Terms:
> > >>                 banks                                   =>
> > >> 3.841823268955143
> > >>                 bank                                    =>
> > >>  3.80633066361209
> > >>                 debt                                    =>
> > >>  3.28065219870794
> > >>                 said                                    =>
> > >>  2.5965700942088583
> > >>                 he                                      =>
> > >> 2.335682813857497
> > >>                 foreign                                 =>
> > >>  2.2217853688201403
> > >>                 billion                                 =>
> > >>  2.1970193848291335
> > >>                 would                                   =>
> > >>  1.9932392063955617
> > >>                 loans                                   =>
> > >>  1.9309276792854233
> > >>                 interest                                =>
> > >>  1.787324501938
> > >>                 have                                    =>
> > >> 1.762981951432578
> > >>                 its                                     =>
> > >>  1.7615109954971866
> > >>                 which                                   =>
> > >>  1.5822081148036862
> > >>                 has                                     =>
> > >>  1.5600708189041956
> > >>                 dlrs                                    =>
> > >>  1.5571038313005996
> > >>                 finance                                 =>
> > >>  1.5539758811252924
> > >>                 new                                     =>
> > >>  1.5176015811577555
> > >>                 had                                     =>
> > >>  1.5138723701401844
> > >>                 brazil                                  =>
> > >>  1.5083369853593172
> > >>                 payments                                =>
> > >>  1.4539044255886517
> > >>         Weight : [props - optional]:  Point:
> > >>
> > >> :VL-7320{n=2726 c=[0:0.003, 0.1:0.010, 0.2:0.007, 0.3:0.009, 0.4:0.007,
> > >> 0.40:0.003, 0.5:0.009, 0.57:0
> > >>         Top Terms:
> > >>                 vs                                      =>
> > >> 6.126130791333171
> > >>                 net                                     =>
> > >> 4.012191567277523
> > >>                 cts                                     =>
> > >> 3.822006848832744
> > >>                 shr                                     =>
> > >>  3.6786004856764527
> > >>                 mln                                     =>
> > >>  2.9011643584038698
> > >>                 loss                                    =>
> > >> 2.788368861463607
> > >>                 qtr                                     =>
> > >> 2.714140225051522
> > >>                 revs                                    =>
> > >>  2.4739861236454717
> > >>                 profit                                  =>
> > >>  1.8146888090247015
> > >>                 note                                    =>
> > >>  1.7977163272138388
> > >>                 dlrs                                    =>
> > >>  1.6164390808155846
> > >>                 avg                                     =>
> > >>  1.3901765773336587
> > >>                 shrs                                    =>
> > >>  1.3856326531419314
> > >>                 mths                                    =>
> > >>  1.3168717272038506
> > >>                 4th                                     =>
> > >>  1.2161158425617289
> > >>                 oper                                    =>
> > >> 1.182419473776814
> > >>                 year                                    =>
> > >> 1.178086061733047
> > >>                 nine                                    =>
> > >>  1.0670554836445316
> > >>                 3rd                                     =>
> > >> 1.041334410056592
> > >>                 inc                                     =>
> > >>  1.0019361981554935
> > >>         Weight : [props - optional]:  Point:
> > >>
> > >>
> > >> Inter-Cluster Density: 0.45562152681859414
> > >> Intra-Cluster Density: 0.6952712632167628
> > >> CDbw Inter-Cluster Density: 0.0
> > >> CDbw Intra-Cluster Density: 16.486930227598684
> > >> CDbw Separation: 194.49005884464628
> > >>
> > >> *fuzzy k-means:*
> > >> :SV-18539{n=1039 c=[0:0.026, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> > >> 0.01:0.005, 0.02:0.002, 0.0
> > >>         Top Terms:
> > >>                 said                                    =>
> > >>  1.8665592354713065
> > >>                 its                                     =>
> > >>  1.1335212213411592
> > >>                 pct                                     =>
> > >>  1.0862816801353348
> > >>                 dlrs                                    =>
> > >>  1.0854998884993752
> > >>                 mln                                     =>
> > >> 1.043163996400643
> > >>                 from                                    =>
> > >>  0.9684961110525736
> > >>                 has                                     =>
> > >> 0.912161511978058
> > >>                 company                                 =>
> > >>  0.8754186972808333
> > >>                 mar                                     =>
> > >>  0.8675333452422878
> > >>                 inc                                     =>
> > >>  0.7678617590362815
> > >>                 would                                   =>
> > >>  0.7610968883652675
> > >>                 he                                      =>
> > >>  0.7459988770503974
> > >>                 which                                   =>
> > >>  0.7435613119406804
> > >>                 year                                    =>
> > >>  0.7302840632748394
> > >>                 u.s                                     =>
> > >>  0.7281061062439116
> > >>                 shares                                  =>
> > >>  0.7260764102983083
> > >>                 corp                                    =>
> > >>  0.7179807367808658
> > >>                 new                                     =>
> > >>  0.7044203783157115
> > >>                 stock                                   =>
> > >>  0.6962010978721442
> > >>                 have                                    =>
> > >>  0.6464265467298506
> > >> :SV-9431{n=1034 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> > >> 0.01:0.004, 0.02:0.002, 0.02
> > >>         Top Terms:
> > >>                 said                                    =>
> > >> 1.864911184196927
> > >>                 dlrs                                    =>
> > >> 1.199286689822081
> > >>                 mln                                     =>
> > >>  1.1802134783562215
> > >>                 pct                                     =>
> > >>  1.1529704214798124
> > >>                 its                                     =>
> > >>  1.1184398851519701
> > >>                 from                                    =>
> > >> 1.016647848050332
> > >>                 company                                 =>
> > >> 0.894703604722841
> > >>                 mar                                     =>
> > >> 0.879986159541356
> > >>                 has                                     =>
> > >>  0.8642799128491316
> > >>                 year                                    =>
> > >>  0.8271823503717782
> > >>                 inc                                     =>
> > >>  0.7871293745341424
> > >>                 corp                                    =>
> > >> 0.737705498468879
> > >>                 which                                   =>
> > >> 0.722975201852743
> > >>                 would                                   =>
> > >> 0.708000816484415
> > >>                 u.s                                     =>
> > >>  0.7073294276173905
> > >>                 billion                                 =>
> > >>  0.7055723996916351
> > >>                 he                                      =>
> > >>  0.7042684217823294
> > >>                 new                                     =>
> > >>  0.6834737905434939
> > >>                 shares                                  =>
> > >>  0.6753327384172428
> > >>                 stock                                   =>
> > >>  0.6576225144041699
> > >> :SV-4785{n=1044 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> > >> 0.01:0.006, 0.02:0.002, 0.02
> > >>         Top Terms:
> > >>                 said                                    =>
> > >>  1.8796076179735086
> > >>                 its                                     =>
> > >> 1.172025965452378
> > >>                 dlrs                                    =>
> > >> 1.130422792460914
> > >>                 pct                                     =>
> > >> 1.082038255241358
> > >>                 mln                                     =>
> > >>  1.0772146872767114
> > >>                 company                                 =>
> > >>  0.9662235879639138
> > >>                 from                                    =>
> > >>  0.9473172871605616
> > >>                 has                                     =>
> > >>  0.9224712965830099
> > >>                 mar                                     =>
> > >>  0.8769325856924421
> > >>                 inc                                     =>
> > >>  0.8360245257169788
> > >>                 shares                                  =>
> > >>  0.8334595641384324
> > >>                 stock                                   =>
> > >>  0.7704621839612175
> > >>                 corp                                    =>
> > >>  0.7682400250301806
> > >>                 which                                   =>
> > >>  0.7389988207856137
> > >>                 would                                   =>
> > >>  0.7339708917389389
> > >>                 year                                    =>
> > >>  0.7088414843731325
> > >>                 new                                     =>
> > >>  0.7038109468655172
> > >>                 he                                      =>
> > >>  0.6993994455501005
> > >>                 u.s                                     =>
> > >>  0.6772649147622415
> > >>                 share                                   =>
> > >>  0.6241804830055171
> > >>
> > >> *lda:*
> > >>
> > >> [snip]
> > >> 21539
> > >> {0.02:0.7071698119320297,0.01:0.09185661419250732,0.055:0.05559716236030052,0:0.04416624482186779,0.046:0.04110903741109505,0.10:0.036664417789754995,0.073:0.009543544335363714,0.006913:0.0050293341354450535,0.007050:0.004031353935388081,0.003:0.0019163134919350053}
> > >> 21540
> > >> {0.04:0.4289840457590362,0.006913:0.3764552842292379,0.03:0.14201700033942147,0.025:0.031492533562460345,0.01:0.0057428959027208,0.0625:0.005563615996288134,0.007050:0.004410121345864598,0.02:0.003315679992182833,0.06:0.0010680133665222197,0.057:2.6908116668663575E-4}
> > >> 21541
> > >> {0:0.6323790020346274,0.02:0.282038438551802,0.06:0.05929917592183956,0.046:0.021922159161082488,0.025:0.003123559949176154,0.1:6.163861804777446E-4,0.057:1.1513607281652563E-4,0.077:7.546641269535658E-5,0.05:6.020069105221075E-5,0.04:5.727546417602751E-5}
> > >> 21542
> > >> {0.06:0.7916202902993545,0.003:0.12474538653424426,0.03:0.0516187501990535,0.073:0.013078206873952192,0.077:0.006874558859394474,0.1:0.004726257215175674,0.07:0.0029954110394770084,0.02:0.0015535221634909381,0.0625:8.057802584933225E-4,0.006913:5.349608183182145E-4}
> > >> 21543
> > >> {0.03:0.8557703918728101,0.007050:0.0884696514495358,0.01:0.02386217742025169,0.055:0.01794750983209314,0.046:0.007244240662980594,0.02:0.0025021610305199344,0.1:0.0011795344808501283,0.0625:9.555718731181631E-4,0.077:8.384080940465597E-4,0.003:2.4480111447900804E-4}
> > >> 21544
> > >> {0.006913:0.6497351267772595,0.03:0.2586364130498913,0.003:0.0845308091084703,0.05:0.003488921831506818,0.0625:0.0011991059579690732,0.06:5.301659535652308E-4,0.055:3.152013670552699E-4,0.025:3.11949418681945E-4,0.02:2.2346509541410544E-4,0.057:1.727741085612565E-4}
> > >> 21545
> > >> {0.07:0.24937001680778675,0.03:0.22854680579732564,0.1:0.22068070811382484,0.05:0.21891358916881656,0.06:0.04971205462818302,0.046:0.029384231252419384,0.025:0.0014118858333346275,0.0625:0.0012321257282372393,0.073:3.1606402052550454E-4,0.04:1.1662985389192366E-4}
> > >> 21546
> > >> {0.006913:0.652435612545713,0.073:0.24282539253890825,0.003:0.055020663101050894,0.046:0.01978650831133704,0.04:0.011699600233998459,0.02:0.006822518565048025,0.05:0.0042733514779345234,0.025:0.0020768247329981497,0.03:0.0015466188524926666,0:0.0010433788491090747}
> > >> 21547
> > >> {0.1:0.935072008706917,0.077:0.040616382672055494,0.046:0.023138054104883225,0.06:3.42252302356326E-4,0.03:2.672247473681925E-4,0.007050:9.004419280457053E-5,0.006913:7.844271097106069E-5,0.04:7.198835322717653E-5,0.073:5.8017696474244885E-5,0.02:4.474316852571109E-5}
> > >> 21548
> > >> {0.006913:0.4510141547325999,0.03:0.21472666934984877,0.07:0.15060570743672352,0.046:0.08216181853028293,0.05:0.07498297963542139,0.077:0.01401215532342401,0.04:0.006135722806477439,0.073:0.0031124469556872442,0.02:0.001327252389650958,0.025:4.5167404646311124E-4}
> > >> 21549
> > >> {0.077:0.5249260290096315,0:0.27520186965742544,0.073:0.08959570186504386,0.057:0.05063809804337512,0.02:0.045313417578343,0.03:0.00856024297303885,0.06:0.0034039136814002697,0.07:8.611323331226122E-4,0.05:3.9387255234958607E-4,0.04:3.454752498776842E-4}
> > >> 21550
> > >> {0.077:0.8319708524327014,0.007050:0.16001430652201168,0.0625:0.0024884528530674226,0.03:0.0024228964094551416,0.006913:0.0010048446080994332,0.025:5.974607168723063E-4,0.06:2.7741593377792194E-4,0.057:2.2975316663437597E-4,0.073:2.1769059479546598E-4,0.01:1.4945234676270913E-4}
> > >> 21551
> > >> {0:0.43298549949426596,0.07:0.30407459462158964,0.0625:0.15744077685124136,0.01:0.06385977925647884,0.025:0.020205557109867888,0.04:0.013226123091680062,0.055:0.006501562964287696,0.073:5.871897881404074E-4,0.003:4.554836644848264E-4,0.05:2.0489216962221778E-4}
> > >> 21552
> > >> {0.06:0.7383744333942458,0.02:0.07321126682102753,0.07:0.05910378841288311,0.10:0.056907223730939045,0:0.02739559786902668,0.055:0.02228913751272657,0.1:0.00943274247398869,0.073:0.007301445750018608,0.03:0.0027711985062277246,0.046:0.0022569760697531112}
> > >> 21553
> > >> {0.006913:0.9383779606743132,0.07:0.018356074893823605,0.1:0.017572502072403694,0.025:0.011031848455352145,0.02:0.007110054905474798,0.10:0.0032390933761145377,0.01:0.0022676909091530165,0.06:6.35712654913703E-4,0.003:3.037608224814005E-4,0.0625:2.569989005512836E-4}
> > >> 21554
> > >> {0.02:0.4740260172915081,0.03:0.3264871353578654,0.003:0.12383071192450323,0:0.0387913487693229,0.046:0.013922363892934853,0.01:0.01115832101415319,0.055:0.004438529958216764,0.006913:0.0031112721201723413,0.025:0.002399946628810436,0.073:6.131643125523258E-4}
> > >> 21555
> > >> {0.01:0.42396888624053325,0.057:0.2876207419392007,0.055:0.13436747721404915,0.10:0.0923627978966975,0.05:0.030530372962547347,0.003:0.008404234144369286,0.006913:0.007605013189108045,0.007050:0.005108825532219537,0.0625:0.004651573366090424,0.025:0.002180181204608079}
> > >> 21556
> > >> {0.02:0.9634685201058687,0.077:0.02816791882994464,0.003:0.007368001305747207,0.01:4.6299664334060455E-4,0.055:6.358811991095171E-5,0.06:5.5240076581247115E-5,0.03:5.231350505975146E-5,0.073:4.4773553407989E-5,0:3.6810457882911134E-5,0.006913:3.469265861451538E-5}
> > >> 21557
> > >> {0.06:0.9993947051352264,0.05:2.53296407549323E-4,0.03:9.647254237540585E-5,0.04:4.3439270433017595E-5,0.006913:3.354681491698619E-5,0.046:2.8586727566416525E-5,0.007050:2.5584606074089293E-5,0.02:2.2477243875316502E-5,0:1.9194783598477564E-5,0.073:1.6900267481892075E-5}
> > >> 21558
> > >> {0.06:0.9985073788555696,0.03:8.033091100049725E-4,0.02:1.1953766739610202E-4,0.1:9.257610652171745E-5,0.006913:6.903783269939522E-5,0.04:5.7960967280609926E-5,0.07:5.5767050956214925E-5,0.046:4.857027337508731E-5,0.007050:3.5964741541364354E-5,0.057:3.538185773175377E-5}
> > >> 21559
> > >> {0.006913:0.5411158302162348,0.073:0.10595054605908563,0.04:0.10013413310674449,0.007050:0.08891690362990352,0.003:0.08028744789933502,0.03:0.07779866329563544,0.025:0.0024314950846438975,0.0625:0.0018111845757907532,0.077:5.019763638644379E-4,0.055:2.6705221435486376E-4}
> > >> 21560
> > >> {0.06:0.9978884964462115,0.03:9.139179655096763E-4,0.02:6.066525825847198E-4,0.04:3.2209695809936266E-4,0.006913:7.970816960964983E-5,0.007050:5.58984012266038E-5,0.046:1.545206468939752E-5,0.077:1.5319891034946036E-5,0:1.5223193391279898E-5,0.073:1.1687759018690983E-5}
> > >> 21561
> > >> {0.06:0.8704787441462888,0.007050:0.09131090501970876,0.0625:0.03778566273654969,0.046:8.111987924033587E-5,0.01:6.397000919080148E-5,0.1:4.979138875277178E-5,0.073:2.7942853854174412E-5,0.03:2.762357950066146E-5,0.077:2.6820742114533983E-5,0.025:2.6208487691114472E-5}
> > >> 21562
> > >> {0.06:0.9625603310705717,0.02:0.01708933716171641,0.007050:0.011374975845817934,0.07:0.00482114999912725,0.057:0.003310531318631415,0.077:2.568457462720504E-4,0.025:1.3439876933758153E-4,0.1:1.0512433283405881E-4,0:8.737443941062146E-5,0.046:5.9284232059258864E-5}
> > >> 21563
> > >> {0.06:0.9996809273982157,0.046:1.6325482120709976E-4,0.02:2.1084598024339765E-5,0.006913:1.7901242019979392E-5,0.04:1.3991162886383784E-5,0.03:1.3682157013015017E-5,0.077:1.3602492456590212E-5,0.007050:1.206712606141859E-5,0.1:1.0231842284246997E-5,0.057:7.704725064510759E-6}
> > >> 21564
> > >> {0.06:0.9997861963976675,0.03:2.7140899197995276E-5,0.077:2.4456158311423386E-5,0.04:1.7952818233846462E-5,0.02:1.6334455815684533E-5,0.006913:1.4910843270220926E-5,0.073:1.4893553937733922E-5,0.1:1.2545291899719683E-5,0.007050:1.2337617858874285E-5,0.07:1.121302251254011E-5}
> > >> 21565
> > >> {0.006913:0.5047832315246878,0.007050:0.2502505818382197,0.04:0.09937533960784072,0.03:0.05332716291468396,0.0625:0.035738976624857435,0.05:0.023139962103851885,0.1:0.01510786357969295,0.025:0.01294816540331917,0.06:0.003589347327961106,0.073:4.8607269023994543E-4}
> > >> 21566
> > >> {0.03:0.865579490292393,0.073:0.065013560785593,0.077:0.056622094108767465,0.046:0.006811842330071251,0.057:0.0021561477140846267,0.01:0.0013149375957061502,0.04:7.456782721333958E-4,0.05:6.412995789267404E-4,0:2.107021879325011E-4,0.007050:1.8121393989190674E-4}
> > >> 21567
> > >> {0.077:0.6528663315309344,0.03:0.2794089480653573,0.025:0.060214953606503134,0.003:0.0029851203917978303,0.01:0.002840784719750811,0.007050:5.508511345707982E-4,0.02:3.3395642786457786E-4,0.07:1.7452781529689483E-4,0.055:1.541344869853217E-4,0.046:1.0154945247629696E-4}
> > >> 21568
> > >> {0.057:0.7925855379581803,0.03:0.2036907633660934,0.06:0.002283924010657722,0.046:0.0011125177332923534,0.10:6.706442563331911E-5,0.02:3.9070227131596934E-5,0.07:3.266094677087569E-5,0.1:2.3436639268605713E-5,0.077:2.318067689954084E-5,0.006913:2.273173553155518E-5}
> > >> 21569
> > >> {0.06:0.7223525910216753,0.1:0.22280373045161775,0.04:0.04239924319412595,0.02:0.006529556700876843,0.007050:0.004661124794787862,0.10:2.017974034648702E-4,0.05:2.017516118028694E-4,0.025:1.583677755896652E-4,0.006913:1.1939663934259253E-4,0:8.736457986006156E-5}
> > >> 21570
> > >> {0.073:0.5321098304788365,0.006913:0.3897385574581158,0.02:0.051125703897749404,0.077:0.012417023254098358,0.01:0.006491518762040415,0.03:0.005434610908750246,0.055:9.352347087701305E-4,0.06:5.239363525283659E-4,0.10:4.4114444135088393E-4,0.003:2.2360302221231853E-4}
> > >> 21571
> > >> {0.06:0.9074255414695478,0.05:0.08971808812931319,0.02:0.0019124104766371694,0.1:2.581234320438502E-4,0.073:1.9041498001195312E-4,0.046:8.348355382104383E-5,0.006913:5.748062878632201E-5,0.057:4.8979196235823963E-5,0.04:4.4005272512340306E-5,0.03:4.1089625609562384E-5}
> > >> 21572
> > >> {0.06:0.7216888928389846,0.04:0.1955684645266304,0.006913:0.08235454447065854,0.1:9.82899034505965E-5,0.046:4.9184577303445956E-5,0.05:4.021965070167039E-5,0.007050:3.081280652634891E-5,0.073:2.791145099471127E-5,0.02:1.95681856192452E-5,0:1.9037190007395713E-5}
> > >> 21573
> > >> {0.05:0.8568971411565196,0.046:0.12909436895238377,0.06:0.012704234652048044,0.04:2.472784666357729E-4,0.055:2.1886698996361582E-4,0.1:1.1974451444757112E-4,0.0625:9.082357988309755E-5,0.07:9.030017229129562E-5,0.03:7.268045692763623E-5,0.073:6.607926928741721E-5}
> > >> 21574
> > >> {0.046:0.5619466458628039,0.006913:0.184782367089353,0.0625:0.09726566772972363,0.003:0.09534816862353344,0.02:0.015692927163565275,0.073:0.015492017672231727,0.01:0.01411218625979968,0.007050:0.01037341031640615,0.055:0.0020124341216292752,0:0.0013828685922332715}
> > >> 21575
> > >> {0.05:0.3167681189235041,0.06:0.2879333280436204,0.046:0.22584628506521745,0.003:0.15994241628395953,0.0625:0.006482994028630967,0.1:0.0016773104050919493,0.055:4.8725015996676173E-4,0.03:2.432893903382962E-4,0.025:1.6015914759364425E-4,0.006913:1.0021292427951807E-4}
> > >> 21576
> > >> {0.077:0.2407816981967022,0.003:0.18594668103110193,0.10:0.14951775492012523,0.0625:0.09241152906714677,0.007050:0.08997645163280943,0.057:0.07102865286733068,0.055:0.048046579920457584,0.05:0.03776387140040494,0.073:0.033355081179026046,0.006913:0.015980584385115525}
> > >> 21577
> > >> {0.06:0.9599074612361259,0.1:0.03694135499501186,0.04:0.0021941315048273186,0.057:4.7070027358666304E-4,0.02:1.8231606308803002E-4,0.0625:6.234942557920162E-5,0.006913:5.159011604129561E-5,0.046:3.9832812943124216E-5,0.007050:3.0380683715134534E-5,0.05:2.8920851352755496E-5}
> > >>
> > >>
> > >> *Streaming k-means:*
> > >>
> > >> [snip]
> > >> INFO: Number of Centroids: 0
> > >> Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
> > >> WARNING: job_local23982482_0001
> > >> java.lang.IllegalArgumentException: Must have nonzero number of training
> > >> and test vectors. Asked for %.1f %% of %d vectors for test
> > >> [10.000000149011612, 0]
> > >>         at
> > >> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
> > >>         at
> > >> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
> > >>         at
> > >> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
> > >>         at
> > >> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
> > >>         at
> > >> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
> > >>         at
> > >> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
> > >>         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
> > >>         at
> > >> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
> > >>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
> > >>         at
> > >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
> > >>
> > >> [snip]
> > >>
> > >> WARNING: No qualcluster.props found on classpath, will use command-line
> > >> arguments only
> > >> Num clusters: 0; maxDistance: 0.000000
> > >> [Dunn Index] First: Infinity
> > >> [Davies-Bouldin Index] First: NaN
> > >> Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
> > >> INFO: Program took 535 ms (Minutes: 0.008916666666666666)
> > >> cluster,distance.mean,distance.sd
> > >> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > >>
> > >>
> > >> On Tue, Jan 21, 2014 at 1:47 PM, Andrew Musselman <
> > >> andrew.musselman@gmail.com> wrote:
> > >>
> > >>> *classify-20newsgroups.sh*
> > >>>
> > >>> *Complementary naive bayes:*
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances          :      11207       98.9406%
> > >>> Incorrectly Classified Instances        :        120        1.0594%
> > >>> Total Classified Instances              :      11327
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a       b       c       d       e       f       g       h       i
> > >>> j       k       l       m       n       o       p       q       r       s
> > >>>    t        <--Classified as
> > >>> 475     0       0       1       0       0       0       0       0
> > >>> 0       0       0       0       0       1       0       1       0       0
> > >>>    0         |  478         a     = alt.atheism
> > >>> 0       597     1       1       0       1       1       0       0
> > >>> 0       0       1       0       2       1       0       0       0       0
> > >>>    0         |  605         b     = comp.graphics
> > >>> 0       1       620     3       0       1       0       0       0
> > >>> 0       0       1       0       0       1       0       0       0       0
> > >>>    0         |  627         c     = comp.os.ms-windows.misc
> > >>> 1       1       1       593     2       0       0       0       0
> > >>> 0       0       0       0       0       0       1       0       0       0
> > >>>    0         |  599         d     = comp.sys.ibm.pc.hardware
> > >>> 0       1       1       0       568     0       1       0       0
> > >>> 0       1       1       2       0       0       0       0       1       0
> > >>>    0         |  576         e     = comp.sys.mac.hardware
> > >>> 0       4       2       0       0       581     0       0       0
> > >>> 0       0       0       0       0       0       0       0       0       0
> > >>>    0         |  587         f     = comp.windows.x
> > >>> 0       0       0       1       2       0       571     3       0
> > >>> 0       1       1       4       1       0       0       0       0       0
> > >>>    0         |  584         g     = misc.forsale
> > >>> 0       0       0       1       0       0       0       589     1
> > >>> 0       0       1       1       0       0       0       0       0       0
> > >>>    0         |  593         h     = rec.autos
> > >>> 0       0       0       0       0       0       0       1       565
> > >>> 0       0       0       0       0       1       0       0       0       0
> > >>>    0         |  567         i     = rec.motorcycles
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 600     2       0       0       0       1       0       0       0       0
> > >>>    0         |  603         j     = rec.sport.baseball
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 1       584     0       0       0       0       0       0       0       0
> > >>>    0         |  585         k     = rec.sport.hockey
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       579     0       0       0       0       0       1       0
> > >>>    0         |  580         l     = sci.crypt
> > >>> 0       0       0       1       3       0       2       0       0
> > >>> 2       0       0       567     1       2       1       0       0       0
> > >>>    0         |  579         m     = sci.electronics
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       0       1       605     0       0       0       0       0
> > >>>    0         |  606         n     = sci.med
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       0       0       0       602     0       0       0       0
> > >>>    0         |  602         o     = sci.space
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       0       0       1       0       602     0       0       1
> > >>>    0         |  604         p     = soc.religion.christian
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       0       0       0       0       0       556     0       0
> > >>>    0         |  556         q     = talk.politics.mideast
> > >>> 0       0       1       0       0       0       0       0       0
> > >>> 0       0       1       0       0       1       0       0       568     0
> > >>>    0         |  571         r     = talk.politics.guns
> > >>> 11      0       0       0       0       0       0       0       0
> > >>> 1       0       0       0       1       3       8       1       4       338
> > >>>    2         |  369         s     = talk.religion.misc
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       1       0       0       0       1       0       3       4       0
> > >>>    447       |  456         t     = talk.politics.misc
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa                                       0.9806
> > >>> Accuracy                                   98.9406%
> > >>> Reliability                                94.0932%
> > >>> Reliability (standard deviation)            0.2163
> > >>>
> > >>> Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 15870 ms (Minutes: 0.2645)
> > >>> + echo 'Testing on holdout set'
> > >>> Testing on holdout set
> > >>> + ./bin/mahout testnb -i /tmp/mahout-work-ec2-user/20news-test-vectors
> > >>> -m /tmp/mahout-work-ec2-user/model -l /tmp/mahout-work-ec2-user/labelindex
> > >>> -ow -o /tmp/mahout-work-ec2-user/20news-testing -c
> > >>>
> > >>> [snip]
> > >>>
> > >>> INFO: Complementary Results:
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances          :       6715       89.3071%
> > >>> Incorrectly Classified Instances        :        804       10.6929%
> > >>> Total Classified Instances              :       7519
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a       b       c       d       e       f       g       h       i
> > >>> j       k       l       m       n       o       p       q       r       s
> > >>>    t        <--Classified as
> > >>> 298     0       0       0       0       0       0       0       0
> > >>> 1       0       0       0       1       2       5       1       0       13
> > >>>     0         |  321         a     = alt.atheism
> > >>> 0       298     11      6       1       12      2       2       1
> > >>> 1       3       8       3       4       2       4       1       4       4
> > >>>    1         |  368         b     = comp.graphics
> > >>> 1       17      286     16      4       9       6       3       2
> > >>> 0       1       0       1       7       1       0       2       1       0
> > >>>    1         |  358         c     = comp.os.ms-windows.misc
> > >>> 2       6       11      309     9       5       14      8       1
> > >>> 0       2       0       6       4       2       0       1       2       1
> > >>>    0         |  383         d     = comp.sys.ibm.pc.hardware
> > >>> 0       10      8       7       334     7       5       5       2
> > >>> 0       3       0       2       1       1       0       1       1       0
> > >>>    0         |  387         e     = comp.sys.mac.hardware
> > >>> 1       13      7       8       2       355     2       0       2
> > >>> 0       0       5       1       1       3       0       0       1       0
> > >>>    0         |  401         f     = comp.windows.x
> > >>> 0       7       11      29      12      9       268     16      8
> > >>> 4       3       2       6       4       2       1       3       1       2
> > >>>    3         |  391         g     = misc.forsale
> > >>> 0       1       0       0       3       0       7       362     8
> > >>> 2       2       1       2       0       2       0       1       2       0
> > >>>    4         |  397         h     = rec.autos
> > >>> 0       0       0       1       0       0       1       0       423
> > >>> 0       0       0       2       1       0       1       0       0       0
> > >>>    0         |  429         i     = rec.motorcycles
> > >>> 0       0       1       0       0       0       0       2       2
> > >>> 371     8       0       2       3       0       2       0       0       0
> > >>>    0         |  391         j     = rec.sport.baseball
> > >>> 0       0       1       0       0       0       1       0       0
> > >>> 2       409     0       0       0       0       0       0       0       0
> > >>>    1         |  414         k     = rec.sport.hockey
> > >>> 0       0       1       2       1       0       1       0       0
> > >>> 0       0       404     0       0       0       0       0       1       0
> > >>>    1         |  411         l     = sci.crypt
> > >>> 0       5       4       11      1       3       7       9       2
> > >>> 5       3       3       339     2       6       0       1       1       2
> > >>>    1         |  405         m     = sci.electronics
> > >>> 0       4       0       1       0       0       0       1       0
> > >>> 1       1       0       3       367     3       1       2       0       0
> > >>>    0         |  384         n     = sci.med
> > >>> 0       1       2       0       1       0       2       0       0
> > >>> 1       0       0       1       1       375     0       1       0       0
> > >>>    0         |  385         o     = sci.space
> > >>> 4       2       1       1       0       0       1       1       2
> > >>> 0       0       1       1       5       1       367     4       0       1
> > >>>    1         |  393         p     = soc.religion.christian
> > >>> 0       1       0       0       0       0       0       0       0
> > >>> 2       0       0       0       0       0       2       378     0       1
> > >>>    0         |  384         q     = talk.politics.mideast
> > >>> 0       0       0       0       0       2       1       1       1
> > >>> 1       0       3       0       3       0       0       2       319     2
> > >>>    4         |  339         r     = talk.politics.guns
> > >>> 32      0       0       1       0       0       0       0       0
> > >>> 1       1       1       0       2       2       26      5       7       175
> > >>>    6         |  259         s     = talk.religion.misc
> > >>> 0       0       0       2       0       0       0       0       0
> > >>> 1       2       2       0       1       2       1       10      18      2
> > >>>    278       |  319         t     = talk.politics.misc
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa                                       0.8594
> > >>> Accuracy                                   89.3071%
> > >>> Reliability                                 84.611%
> > >>> Reliability (standard deviation)            0.2148
> > >>>
> > >>> Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 10840 ms (Minutes: 0.18066666666666667)
> > >>>
> > >>>
> > >>> *Naive bayes:*
> > >>> INFO: Standard NB Results:
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances          :      11286       99.0869%
> > >>> Incorrectly Classified Instances        :        104        0.9131%
> > >>> Total Classified Instances              :      11390
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a       b       c       d       e       f       g       h       i
> > >>> j       k       l       m       n       o       p       q       r       s
> > >>>    t        <--Classified as
> > >>> 474     0       0       0       0       0       0       0       0
> > >>> 0       0       0       0       0       0       0       0       0       2
> > >>>    1         |  477         a     = alt.atheism
> > >>> 0       566     0       2       0       1       0       0       0
> > >>> 0       0       0       0       0       0       0       0       0       0
> > >>>    0         |  569         b     = comp.graphics
> > >>> 0       10      590     29      2       4       1       0       0
> > >>> 0       0       0       1       0       0       0       0       0       0
> > >>>    1         |  638         c     = comp.os.ms-windows.misc
> > >>> 0       0       0       596     0       0       0       0       0
> > >>> 0       0       0       0       0       0       0       0       0       0
> > >>>    0         |  596         d     = comp.sys.ibm.pc.hardware
> > >>> 0       0       0       0       575     0       1       0       0
> > >>> 0       0       0       1       0       0       0       0       0       0
> > >>>    0         |  577         e     = comp.sys.mac.hardware
> > >>> 0       2       2       2       0       593     1       0       0
> > >>> 0       0       0       0       0       1       0       0       0       0
> > >>>    0         |  601         f     = comp.windows.x
> > >>> 0       0       0       1       0       0       589     1       0
> > >>> 0       1       0       2       0       0       0       0       0       0
> > >>>    0         |  594         g     = misc.forsale
> > >>> 0       0       0       0       0       0       0       594     0
> > >>> 0       0       0       0       0       0       0       0       0       0
> > >>>    0         |  594         h     = rec.autos
> > >>> 0       0       0       0       0       0       0       0       611
> > >>> 0       0       0       0       0       0       0       0       0       0
> > >>>    0         |  611         i     = rec.motorcycles
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 616     1       0       0       0       0       0       0       0       0
> > >>>    0         |  617         j     = rec.sport.baseball
> > >>> 0       0       0       0       0       0       1       0       0
> > >>> 0       620     0       0       0       0       0       0       0       0
> > >>>    0         |  621         k     = rec.sport.hockey
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       580     0       0       0       0       0       1       0
> > >>>    0         |  581         l     = sci.crypt
> > >>> 0       0       0       3       1       0       0       0       0
> > >>> 0       0       0       571     0       0       0       0       0       0
> > >>>    0         |  575         m     = sci.electronics
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       0       2       583     0       0       0       0       0
> > >>>    0         |  585         n     = sci.med
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       0       0       1       599     0       0       0       0
> > >>>    0         |  600         o     = sci.space
> > >>> 0       1       0       0       0       0       0       0       0
> > >>> 0       0       0       0       0       0       615     0       0       0
> > >>>    0         |  616         p     = soc.religion.christian
> > >>> 1       0       0       0       0       0       0       0       0
> > >>> 0       0       0       0       0       0       1       560     0       0
> > >>>    0         |  562         q     = talk.politics.mideast
> > >>> 0       0       1       0       0       0       0       0       0
> > >>> 0       0       1       0       0       0       0       0       548     0
> > >>>    1         |  551         r     = talk.politics.guns
> > >>> 10      0       0       0       0       0       0       0       0
> > >>> 0       0       0       0       0       1       1       0       2       344
> > >>>    1         |  359         s     = talk.religion.misc
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       1       1       0       0       0       0       2       0
> > >>>    462       |  466         t     = talk.politics.misc
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa                                       0.9847
> > >>>  Accuracy                                   99.0869%
> > >>> Reliability                                94.3334%
> > >>> Reliability (standard deviation)            0.2169
> > >>>
> > >>> Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 14304 ms (Minutes: 0.2384)
> > >>> + echo 'Testing on holdout set'
> > >>> Testing on holdout set
> > >>>
> > >>> [snip]
> > >>>
> > >>> INFO: Standard NB Results:
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances          :       6718       90.1019%
> > >>> Incorrectly Classified Instances        :        738        9.8981%
> > >>> Total Classified Instances              :       7456
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a       b       c       d       e       f       g       h       i
> > >>> j       k       l       m       n       o       p       q       r       s
> > >>>    t        <--Classified as
> > >>> 294     0       0       0       0       0       0       0       0
> > >>> 0       0       2       0       1       1       6       1       1       16
> > >>>     0         |  322         a     = alt.atheism
> > >>> 0       345     6       14      6       11      6       0       0
> > >>> 0       0       5       7       1       3       0       0       0       0
> > >>>    0         |  404         b     = comp.graphics
> > >>> 2       29      177     78      22      19      9       1       0
> > >>> 0       0       4       2       0       1       1       0       0       1
> > >>>    1         |  347         c     = comp.os.ms-windows.misc
> > >>> 1       9       2       335     18      2       10      0       0
> > >>> 0       1       0       8       0       0       0       0       0       0
> > >>>    0         |  386         d     = comp.sys.ibm.pc.hardware
> > >>> 1       4       2       13      347     3       5       1       0
> > >>> 0       1       0       7       1       0       0       0       1       0
> > >>>    0         |  386         e     = comp.sys.mac.hardware
> > >>> 0       20      0       4       0       352     4       0       0
> > >>> 0       0       0       1       1       3       0       1       0       1
> > >>>    0         |  387         f     = comp.windows.x
> > >>> 0       2       0       21      5       1       323     7       2
> > >>> 2       0       2       12      0       3       0       0       0       0
> > >>>    1         |  381         g     = misc.forsale
> > >>> 0       1       0       0       1       0       15      363     8
> > >>> 1       0       0       4       1       0       0       0       1       0
> > >>>    1         |  396         h     = rec.autos
> > >>> 0       1       0       0       0       0       6       6       370
> > >>> 0       0       0       0       1       0       0       0       0       1
> > >>>    0         |  385         i     = rec.motorcycles
> > >>> 1       0       0       1       1       0       2       1       2
> > >>> 362     5       0       2       0       0       0       0       0       0
> > >>>    0         |  377         j     = rec.sport.baseball
> > >>> 0       0       0       1       2       0       0       0       0
> > >>> 3       371     0       0       0       0       0       0       0       0
> > >>>    1         |  378         k     = rec.sport.hockey
> > >>> 0       3       1       0       1       0       2       0       0
> > >>> 0       0       396     0       1       0       0       1       1       1
> > >>>    3         |  410         l     = sci.crypt
> > >>> 0       7       0       7       7       2       6       4       0
> > >>> 0       0       1       369     2       2       0       0       0       0
> > >>>    2         |  409         m     = sci.electronics
> > >>> 0       3       0       2       1       0       2       0       0
> > >>> 0       0       1       4       383     4       0       0       1       0
> > >>>    4         |  405         n     = sci.med
> > >>> 0       5       0       0       1       0       3       0       0
> > >>> 0       0       0       1       0       374     1       0       0       1
> > >>>    1         |  387         o     = sci.space
> > >>> 6       2       0       1       1       0       0       1       0
> > >>> 1       0       0       1       5       0       352     2       1       7
> > >>>    1         |  381         p     = soc.religion.christian
> > >>> 1       1       0       0       0       0       0       0       0
> > >>> 0       1       0       0       0       0       0       373     1       0
> > >>>    1         |  378         q     = talk.politics.mideast
> > >>> 0       0       0       0       0       0       1       0       1
> > >>> 0       0       2       0       0       0       0       0       346     2
> > >>>    7         |  359         r     = talk.politics.guns
> > >>> 26      1       0       1       0       0       0       2       0
> > >>> 1       1       0       0       1       1       20      2       6       200
> > >>>    7         |  269         s     = talk.religion.misc
> > >>> 1       0       0       0       0       0       0       2       0
> > >>> 0       1       0       0       2       2       0       1       14      0
> > >>>    286       |  309         t     = talk.politics.misc
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa                                       0.8726
> > >>> Accuracy                                   90.1019%
> > >>> Reliability                                85.4491%
> > >>> Reliability (standard deviation)            0.2222
> > >>>
> > >>> Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 10878 ms (Minutes: 0.1813)
> > >>>
> > >>> *SGD:*
> > >>> 7532 test files
> > >>>
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances          :       5649            75%
> > >>> Incorrectly Classified Instances        :       1883            25%
> > >>> Total Classified Instances              :       7532
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a       b       c       d       e       f       g       h       i
> > >>> j       k       l       m       n       o       p       q       r       s
> > >>>    t        <--Classified as
> > >>> 186     6       3       10      5       0       33      4       13
> > >>>  15      7       1       24      15      3       15      5       5       29
> > >>>     15        |  394         a     = sci.space
> > >>> 5       309     0       3       2       5       0       0       0
> > >>> 1       9       21      2       0       0       18      4       4       1
> > >>>    1         |  385         b     = comp.sys.mac.hardware
> > >>> 4       1       101     3       0       1       63      0       7
> > >>> 0       1       1       5       16      3       0       3       7       1
> > >>>    34        |  251         c     = talk.religion.misc
> > >>> 11      12      1       265     1       10      3       0       0
> > >>> 17      10      11      5       2       0       11      3       6       21
> > >>>     0         |  389         d     = comp.graphics
> > >>> 2       1       1       0       349     2       3       0       3
> > >>> 2       6       1       5       1       0       2       15      2       1
> > >>>    2         |  398         e     = rec.motorcycles
> > >>> 7       20      3       19      2       254     6       0       2
> > >>> 11      2       39      7       2       0       4       2       2       9
> > >>>    3         |  394         f     = comp.os.ms-windows.misc
> > >>> 2       1       13      0       0       0       247     0       1
> > >>> 1       3       0       6       2       4       0       2       3       5
> > >>>    29        |  319         g     = alt.atheism
> > >>> 1       1       0       0       2       0       2       361     0
> > >>> 1       2       0       2       0       0       1       3       22      0
> > >>>    1         |  399         h     = rec.sport.hockey
> > >>> 3       0       3       1       0       0       5       0       161
> > >>> 0       1       2       12      102     0       0       1       2       11
> > >>>     6         |  310         i     = talk.politics.misc
> > >>> 2       8       0       19      0       19      0       0       1
> > >>> 294     10      11      4       2       0       5       0       3       11
> > >>>     6         |  395         j     = comp.windows.x
> > >>> 2       10      0       1       1       0       0       0       0
> > >>> 1       347     13      2       1       0       5       3       2       2
> > >>>    0         |  390         k     = misc.forsale
> > >>> 1       36      0       6       1       25      0       0       1
> > >>> 6       10      257     2       1       0       34      6       0       6
> > >>>    0         |  392         l     = comp.sys.ibm.pc.hardware
> > >>> 2       2       2       2       1       0       12      0       0
> > >>> 6       10      4       312     5       2       13      11      3       3
> > >>>    6         |  396         m     = sci.med
> > >>> 2       0       3       2       1       0       0       1       13
> > >>>  0       5       1       2       314     2       0       2       2       10
> > >>>     4         |  364         n     = talk.politics.guns
> > >>> 1       0       2       1       1       0       34      1       33
> > >>>  1       3       0       1       8       271     1       4       5       6
> > >>>      3         |  376         o     = talk.politics.mideast
> > >>> 3       14      0       8       2       8       3       1       1
> > >>> 7       12      29      6       2       1       245     13      2       32
> > >>>     4         |  393         p     = sci.electronics
> > >>> 3       3       0       2       11      0       1       0       2
> > >>> 1       11      6       4       2       0       11      330     4       4
> > >>>    1         |  396         q     = rec.autos
> > >>> 0       0       1       0       1       0       4       12      3
> > >>> 1       3       0       0       0       0       5       6       359     1
> > >>>    1         |  397         r     = rec.sport.baseball
> > >>> 0       1       0       0       0       1       0       0       3
> > >>> 3       0       0       3       2       1       6       1       6       366
> > >>>    3         |  396         s     = sci.crypt
> > >>> 0       2       11      1       1       0       40      0       1
> > >>> 2       3       4       2       1       0       5       0       2       2
> > >>>    321       |  398         t     = soc.religion.christian
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa                                       0.7073
> > >>> Accuracy                                        75%
> > >>> Reliability                                70.6238%
> > >>> Reliability (standard deviation)            0.2187
> > >>> Log-likelihood                mean      :    -1.1182
> > >>>                               25%-ile   :    -1.6911
> > >>>                               75%-ile   :    -0.0803
> > >>>
> > >>> Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 10783 ms (Minutes: 0.17971666666666666)
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <su...@yahoo.com>wrote:
> > >>>
> > >>>> Thanks Andrew for reporting that. I rolled back the release to fix this
> > >>>> and few other issues.
> > >>>>
> > >>>> We have removed asf-examples*.sh from trunk as the sample file at the
> > >>>> url mentioned in ur email is not available.
> > >>>> This is something we need to fix and restore in 1.0.
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <
> > >>>> ap.dev@outlook.com> wrote:
> > >>>>
> > >>>> from the asf-email-examples.sh script:
> > >>>>
> > >>>> # You will need to download or otherwise obtain some or all of the
> > >>>> Amazon ASF Em
> > >>>> ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566)
> > >>>> to use this
> > >>>> script.
> > >>>> # To obtain a full copy you will need to launch an EC2 instance and
> > >>>> mount the da
> > >>>> taset to download it, otherwise you can get a sample of it at
> > >>>> #
> > >>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
> > >>>>
> > >>>> It looks like the:
> > >>>>
> > >>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
> > >>>>
> > >>>> link is down.
> > >>>>
> > >>>> Is there somewhere else that we can get a subset of the ASF emails?
> > >>>>
> > >>>>
> > >>>>
> > >>>> Date: Tue, 21 Jan 2014 09:48:06 -0800
> > >>>> > Subject: Re: MAHOUT 0.9 Release - New URL
> > >>>> > From: andrew.musselman@gmail.com
> > >>>> > To: dev@mahout.apache.org
> > >>>> >
> > >>>> > Sure thing; continuing to smoke test the other examples tonight
> > >>>> >
> > >>>> >
> > >>>> > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <
> > >>>> suneel_marthi@yahoo.com>wrote:
> > >>>> >
> > >>>> > > Thanks Andrew M., see that some of the example scripts need to be
> > >>>> fixed as
> > >>>> > > they still refer to the deprecated algorithms.
> > >>>> > > See that the Streaming KMeans has failed for you as well.
> > >>>> > >
> > >>>> > > I'll be rolling back the release today to fix these issues.
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
> > >>>> > > andrew.musselman@gmail.com> wrote:
> > >>>> > >
> > >>>> > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default
> > >>>> 64-bit
> > >>>> > > Linux AMI from tarball.
> > >>>> > >
> > >>>> > > All tests pass.
> > >>>> > >
> > >>>> > > *Output of examples:*
> > >>>> > > *asf-email-examples.sh, run on mahout.apache.org
> > >>>> > > <http://mahout.apache.org>:*
> > >>>> > > *recommendations:*
> > >>>> > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
> > >>>> > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000  | less
> > >>>> > > 1
> > >>>> > >
> > >>>> > >
> > >>>> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
> > >>>> > > 4
> > >>>> > >
> > >>>> > >
> > >>>> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
> > >>>> > > 6
> > >>>> > >
> > >>>> > >
> > >>>> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
> > >>>> > > 8
> > >>>> > >     [12758:1.0,19409:1.0,11112:1.0]
> > >>>> > > 11
> > >>>> > >
> > >>>> > >
> > >>>> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
> > >>>> > > 14
> > >>>> > >
> > >>>> > >
> > >>>> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
> > >>>> > > 15
> > >>>> > >
> > >>>> > >
> > >>>> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
> > >>>> > > 16
> > >>>> > >
> > >>>> > >
> > >>>> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
> > >>>> > > 18
> > >>>> > >
> > >>>> > >
> > >>>> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
> > >>>> > > 19      [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
> > >>>> > > 20
> > >>>> > >
> > >>>> > >
> > >>>> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
> > >>>> > > [snip]
> > >>>> > >
> > >>>> > > *clustering; kmeans:*
> > >>>> > > [snip]
> > >>>> > >         Weight : [props - optional]:  Point:
> > >>>> > >         1.0 :
> > >>>> > >  [distance-squared=1.0193102046188427]:
> > >>>> > > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus=
> > >>>> > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
> > >>>> 7573:0.204,
> > >>>> > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093,
> > >>>> 9779:0.159,
> > >>>> > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
> > >>>> > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
> > >>>> > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
> > >>>> > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
> > >>>> > > 39789:0.110, 40743:0.190, 45775:0.086]
> > >>>> > >         1.0 : [distance-squared=0.9823018320457279]:
> > >>>> > > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus=
> > >>>> > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
> > >>>> 5336:0.106,
> > >>>> > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173,
> > >>>> 7832:0.072,
> > >>>> > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
> > >>>> > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
> > >>>> > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
> > >>>> > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
> > >>>> > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
> > >>>> > >         1.0 : [distance-squared=0.9509142993214911]:
> > >>>> > >
> > >>>> /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
> > >>>> > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
> > >>>> > >  4419:0.076,
> > >>>> > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056,
> > >>>> 7235:0.048,
> > >>>> > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123,
> > >>>> 7683:0.077,
> > >>>> > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
> > >>>> 10225:0.081,
> > >>>> > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
> > >>>> > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
> > >>>> > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
> > >>>> > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
> > >>>> > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
> > >>>> > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
> > >>>> > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
> > >>>> > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
> > >>>> > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
> > >>>> > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
> > >>>> > > 41280:0.065, 41696:0.072, 41947:0.118,
> > >>>> > >  43685:0.086, 44077:0.308,
> > >>>> > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
> > >>>> > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
> > >>>> > > [snip]
> > >>>> > >
> > >>>> > > *clustering; dirichlet:*
> > >>>> > > Get this complaint:
> > >>>> > > Running Dirichlet with K = 8
> > >>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > >>>> > > HADOOP_CONF_DIR=
> > >>>> > > MAHOUT-JOB:
> > >>>> > >
> > >>>> > >
> > >>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > >>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class:
> > >>>> dirichlet
> > >>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props
> > >>>> found on
> > >>>> > > classpath, will use command-line arguments only
> > >>>> > > Unknown program 'dirichlet' chosen.
> > >>>> > >
> > >>>> > > *clustering: minhash:*
> > >>>> > > Running Minhash
> > >>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > >>>> > > HADOOP_CONF_DIR=
> > >>>> > > MAHOUT-JOB:
> > >>>> > >
> > >>>> > >
> > >>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > >>>> > > 14/01/21 05:17:27 WARN
> > >>>> > >  driver.MahoutDriver: Unable to add class: minhash
> > >>>> > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found
> > >>>> on
> > >>>> > > classpath, will use command-line arguments only
> > >>>> > > Unknown program 'minhash' chosen.
> > >>>> > >
> > >>>> > > *classification; standard:*
> > >>>> > > =======================================================
> > >>>> > > Summary
> > >>>> > > -------------------------------------------------------
> > >>>> > > Correctly Classified Instances          :       5384       87.7874%
> > >>>> > > Incorrectly Classified Instances        :        749       12.2126%
> > >>>> > > Total Classified Instances              :       6133
> > >>>> > >
> > >>>> > > =======================================================
> > >>>> > > Confusion Matrix
> > >>>> > > -------------------------------------------------------
> > >>>> > > a       b       c       d
> > >>>> > >     <--Classified as
> > >>>> > > 2949    7       531     25       |  3512        a     = dev
> > >>>> > > 0       0       0       0        |  0           b     = general
> > >>>> > > 99      8       1763    8        |  1878        c     = user
> > >>>> > > 41      1       29      672      |  743         d     = commits
> > >>>> > >
> > >>>> > > =======================================================
> > >>>> > > Statistics
> > >>>> > > -------------------------------------------------------
> > >>>> > > Kappa
> > >>>> > >  0.7877
> > >>>> > > Accuracy                                   87.7874%
> > >>>> > > Reliability                                 53.658%
> > >>>> > > Reliability (standard deviation)            0.4911
> > >>>> > >
> > >>>> > > *classification; complementary:*
> > >>>> > > =======================================================
> > >>>> > > Summary
> > >>>> > > -------------------------------------------------------
> > >>>> > > Correctly Classified Instances          :       5530       90.1679%
> > >>>> > > Incorrectly Classified Instances        :        603        9.8321%
> > >>>> > > Total Classified Instances              :
> > >>>> > >  6133
> > >>>> > >
> > >>>> > > =======================================================
> > >>>> > > Confusion Matrix
> > >>>> > > -------------------------------------------------------
> > >>>> > > a       b       c       d       <--Classified as
> > >>>> > > 3168    0       276     68       |  3512        a     = dev
> > >>>> > > 0       0       0       0        |  0           b     = general
> > >>>> > > 196     0       1652    30       |  1878        c     = user
> > >>>> > > 25      0       8       710      |  743         d     =
> > >>>> > >  commits
> > >>>> > >
> > >>>> > > =======================================================
> > >>>> > > Statistics
> > >>>> > > -------------------------------------------------------
> > >>>> > > Kappa                                       0.8259
> > >>>> > > Accuracy                                   90.1679%
> > >>>> > > Reliability                                54.7459%
> > >>>> > > Reliability (standard deviation)            0.5005
> > >>>> > >
> > >>>> > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms
> > >>>> (Minutes:
> > >>>> > > 0.34836666666666666)
> > >>>> > >
> > >>>> > > *classification; sgd, with three categories:*
> > >>>> > > Running SGD Training
> > >>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
> > >>>> > >  and
> > >>>> > > HADOOP_CONF_DIR=
> > >>>> > > MAHOUT-JOB:
> > >>>> > >
> > >>>> > >
> > >>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > >>>> > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
> > >>>> > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
> > >>>> classpath,
> > >>>> > > will use command-line arguments only
> > >>>> > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
> > >>>> > > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
> > >>>> > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
> > >>>> > > --output=[asf-output/classification/sgd/models], --poolSize=[5],
> > >>>> > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
> > >>>> > > 24168 training files
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       1
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
> > >>>> > >  2
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       3
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       4
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       6
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       8
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       10
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00
> > >>>> > >    0.00    0.00    0.0000000       0.0000000       12
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       15
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       20
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       25
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       30
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000
> > >>>> > >     0.0000000       40
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       50
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       60
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       70
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       80
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       100
> > >>>> > > 0.000
> > >>>> > >  0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       120
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       140
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       150
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       200
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       250
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00
> > >>>> > >  0.00    0.00    0.0000000       0.0000000       300
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       400
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       500
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       600
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       700
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000
> > >>>> > >  0.0000000       800
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > >>>> > > 1.0019413e-08   1000    -0.607  75.78   none
> > >>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > >>>> > > 1.0019413e-08   1200    -0.607  75.78   none
> > >>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > >>>> > > 1.0019413e-08   1400    -0.607  75.78   none
> > >>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > >>>> > > 1.0019413e-08   1500    -0.607  75.78   none
> > >>>> > > 0.24    43686.00        17924.00        329.50
> > >>>> > >  1.0571799e-08
> > >>>> > > 1.0032261e-08   2000    -0.487  82.65   none
> > >>>> > > 0.24    49753.00        21610.00        330.71  1.3770070e-08
> > >>>> > > 1.0011902e-08   2500    -0.439  83.90   none
> > >>>> > > 0.24    49753.00        21610.00        330.71  1.3770070e-08
> > >>>> > > 1.0011902e-08   3000    -0.439  83.90   none
> > >>>> > > 0.32    50635.00        28531.00        437.09  1.0551175e-08
> > >>>> > > 1.0000001e-08   4000    -0.351  88.14   none
> > >>>> > > 0.32    50635.00        32642.00        437.09  1.0551175e-08
> > >>>> > > 1.0000000e-08   5000    -0.378  87.10   none
> > >>>> > > 0.32    50635.00        36461.00        437.09
> > >>>> > >  1.0556652e-08
> > >>>> > > 1.0000001e-08   6000    -0.372  86.89   none
> > >>>> > > 0.32    50635.00        37768.00        437.09  1.0576742e-08
> > >>>> > > 1.0000001e-08   7000    -0.334  89.26   none
> > >>>> > > 0.32    50635.00        38807.00        437.09  1.0576742e-08
> > >>>> > > 1.0000000e-08   8000    -0.368  87.52   none
> > >>>> > > 0.32    50635.00        44731.00        437.09  1.0576716e-08
> > >>>> > > 1.0000000e-08   10000   -0.374  87.39   none
> > >>>> > > 0.32    50635.00        45672.00        437.09  1.0576716e-08
> > >>>> > > 1.0000000e-08   12000   -0.298  88.26   none
> > >>>> > > Exception in thread "main" java.lang.IllegalStateException:
> > >>>> > > java.lang.ArrayIndexOutOfBoundsException:
> > >>>> > >  2
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
> > >>>> > >         at
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
> > >>>> > >         at
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
> > >>>> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > >>>> Method)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > >>>> > >
> > >>>> > >  at
> > >>>> > >
> > >>>> > >
> > >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > >>>> > >         at java.lang.reflect.Method.invoke(Method.java:622)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > >>>> > >         at
> > >>>> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > >>>> > >         at
> > >>>> > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> > >>>> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > >>>> Method)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > >>>> > >         at java.lang.reflect.Method.invoke(Method.java:622)
> > >>>> > >         at
> > >>>> > >  org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> > >>>> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
> > >>>> > >         at
> > >>>> > > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
> > >>>> > >
> > >>>> > >  at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
> > >>>> > >         at
> > >>>> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > >>>> > >         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >>>> > >         at java.lang.Thread.run(Thread.java:701)
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
> > >>>> > > andrew.musselman@gmail.com> wrote:
> > >>>> > >
> > >>>> > > > Trying out the build today
> > >>>> > > >
> > >>>> > > >
> > >>>> > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
> > >>>> suneel_marthi@yahoo.com
> > >>>> > > >wrote:
> > >>>> > > >
> > >>>> > > >> This is an issue (trivial one though) that needs to be fixed for
> > >>>> 0.9
> > >>>> > > >> Release, will be rerolling the release today (in the next few
> > >>>> hrs) and
> > >>>> > > >> putting out a new release candidate in staging.
> > >>>> > > >>
> > >>>> > > >> Thanks for reporting this Andrew P.
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
> > >>>> > > ap.dev@outlook.com>
> > >>>> > > >> wrote:
> > >>>> > > >>
> > >>>> > > >> I ran through the tests with on a CentOS VM
> > >>>> > >  AMD64 2 cores 4 GB RAM.  Had
> > >>>> > > >> a bit of trouble getting the Hadoop natives to compile and
> > >>>> therefore may
> > >>>> > > >> have run into some problems because of the hadoop setup.  Ran
> > >>>> into some
> > >>>> > > >> problems in the example scripts.  Particularly with
> > >>>> > > >> ./cluster-syntheticcontrol.sh ->4,5.  I will run through the
> > >>>> rest of the
> > >>>> > > >> examples when im sure I've got hadoop setup right.
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> Apache Maven 3.1.2-SNAPSHOT
> > >>>> > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> > >>>> > > >> Java home: /usr/java/jdk1.6.0_45/jre
> > >>>> > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch:
> > >>>> "amd64",
> > >>>> > > >> family: "unix"
> > >>>> > > >> $MAHOUT_LOCAL=true
> > >>>> > > >> Hadoop 2.2.0
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> a) Verify that u can unpack the release (tar or zip) ...passed
> > >>>> (tar)
> > >>>> > > >> [passed ]
> > >>>> > > >>
> > >>>> > > >> b) Verify u r able to compile the
> > >>>> > >  distro
> > >>>> > > >>
> > >>>> > > >>     mvn compile- [passed with warnings]
> > >>>> > > >>
> > >>>> > > >>     [WARNING]  Expected all dependencies to require Scala
> > >>>> version: 2.9.3
> > >>>> > > >>     [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires
> > >>>> scala
> > >>>> > > >> version: 2.9.3
> > >>>> > > >>     [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala
> > >>>> > > >> version: 2.9.2
> > >>>> > > >>     [WARNING] Multiple versions of scala libraries detected!
> > >>>> > > >>
> > >>>> > > >> c)  Run through the unit tests: mvn clean test
> > >>>> > > >>     mvn clean test [passed]
> > >>>> > > >>
> > >>>> > > >> d) Run the
> > >>>> > > >>  example scripts under $MAHOUT_HOME/examples/bin.
> > >>>> > > >> Please run through all the different options in each script
> > >>>> > > >>
> > >>>> > > >>     Running example scripts with $MAHOUT_LOCAL=true
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > >  ./cluster-syntheticcontrol.sh ->1 [works]
> > >>>> > > >>     ./cluster-syntheticcontrol.sh ->2 [works]
> > >>>> > > >>     ./cluster-syntheticcontrol.sh ->3 [works]
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>     ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
> > >>>> > > >>     [...]
> > >>>> > > >>     WARNING: Unable to add class:
> > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > >>>> > > >>     java.lang.ClassNotFoundException:
> > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > >>>> > > >>         at
> > >>>> > > >>  java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > >>>> > > >>         at java.security.AccessController.doPrivileged(Native
> > >>>> Method)
> > >>>> > > >>         at
> > >>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > >>>> > > >>         at
> > >>>> > >  java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > >>>> > > >>         at
> > >>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > >>>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > >>>> > > >>         at java.lang.Class.forName0(Native Method)
> > >>>> > > >>         at java.lang.Class.forName(Class.java:171)
> > >>>> > > >>         at
> > >>>> > > >>
> > >>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > >>>> > > >>         at
> > >>>> > > >>
> > >>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > >>>> > > >>     Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>     ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
> > >>>> > > >>
> > >>>> > > >>     WARNING: Unable to add class:
> > >>>> > > >>
> > >>>> > >  org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > >>>> > > >>     java.lang.ClassNotFoundException:
> > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > >>>> > > >>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > >>>> > > >>         at java.security.AccessController.doPrivileged(Native
> > >>>> Method)
> > >>>> > > >>         at
> > >>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > >>>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > >>>> > > >>         at
> > >>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > >>>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > >>>> > > >>         at java.lang.Class.forName0(Native Method)
> > >>>> > > >>         at
> > >>>> > >  java.lang.Class.forName(Class.java:171)
> > >>>> > > >>         at
> > >>>> > > >>
> > >>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > >>>> > > >>         at
> > >>>> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > >>>> > > >>     Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
> > >>>> > > >>     WARNING: No
> > >>>> > > >>
> > >>>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found
> > >>>> > > on
> > >>>> > > >> classpath, will use command-line arguments only
> > >>>> > > >>     Unknown program
> > >>>> > > >>  'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
> > >>>> chosen.
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>     ./classify-20newsgroups.sh ->1 [works]
> > >>>> > > >>     ./classify-20newsgroups.sh ->2 [works]
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>     cluster-reuters.sh ->1 [works]
> > >>>> > > >>
> > >>>> > >  cluster-reuters.sh ->2 [works]
> > >>>> > > >>     cluster-reuters.sh ->3 [works]
> > >>>> > > >>
> > >>>> > > >>     Same error as noted previosly in the thread:
> > >>>> > > >>
> > >>>> > > >>     cluster-reuters.sh ->4 [0 clusters]
> > >>>> > > >>
> > >>>> > > >>     [...]
> > >>>> > > >>
> > >>>> > > >>     WARNING: No qualcluster.props found on classpath, will use
> > >>>> > > >> command-line arguments only
> > >>>> > > >>     Num clusters: 0; maxDistance: 0.000000
> > >>>> > > >>     [Dunn Index]
> > >>>> > > >>  First: Infinity
> > >>>> > > >>     [Davies-Bouldin Index] First: NaN
> > >>>> > > >>     Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>>> > > >>     INFO: Program took 669 ms (Minutes: 0.01115)
> > >>>> > > >>     cluster,distance.mean,distance.sd
> > >>>> > > >>
> > >>>> > >
> > >>>> > >
> > >>>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
> > >>>> > > >> > From: suneel_marthi@yahoo.com
> > >>>> > > >> > Subject: MAHOUT 0.9 Release - New URL
> > >>>> > > >> > To: user@mahout.apache.org; dev@mahout.apache.org
> > >>>> > > >> >
> > >>>> > > >> > Third time's a Charm!!!
> > >>>> > > >> >
> > >>>> > > >> >
> > >>>> > > >> > Here's the new URL for Mahout 0.9 Release:
> > >>>> > > >> >
> > >>>> > > >>
> > >>>> > >
> > >>>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> > >>>> > > >> >
> > >>>> > > >> > For those volunteering to test this, some of the things to be
> > >>>> > > verified:
> > >>>> > > >> >
> > >>>> > > >> > a) Verify that u can unpack the release (tar or zip)
> > >>>> > > >> > b) Verify u r able to compile the distro
> > >>>> > > >> > c)  Run through the unit tests: mvn clean test
> > >>>> > > >> > d) Run the example scripts
> > >>>> > > >>  under $MAHOUT_HOME/examples/bin. Please run through all the
> > >>>> different
> > >>>> > > >> options in each script.
> > >>>> > > >> >
> > >>>> > > >> >
> > >>>> > > >> > Committers
> > >>>> > > >> >  and PMC members:
> > >>>> > > >> > ---------------------------------------
> > >>>> > > >> >
> > >>>> > > >> > Need 'at least 3 +1 votes' for the Release to pass.
> > >>>> > > >> >
> > >>>> > > >> >
> > >>>> > > >> > Thanks and
> > >>>> > >  Regards.
> > >>>> > > >>
> > >>>> > > >
> > >>>> > > >
> > >>>> > >
> > >>>>
> > >>>
> > >>>
> > >>
> > >
> 

RE: MAHOUT 0.9 Release - New URL

Posted by Andrew Palumbo <ap...@outlook.com>.
Everything seems to run well on my local machine:

Checked out revision 1560364.

CentOS 6
Apache Maven 3.1.2-SNAPSHOT 
Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
Java home: /usr/java/jdk1.6.0_45/jre
OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64", family: "unix"
Hadoop 2.2.0


mvn clean compile -DSkipTests [OK-Several Warnings]
mvn clean test [PASSED ALL]
mvn clean install -DskipTests [OK]


$MAHOUT_LOCAL=true 

classify-20newsgroups.sh->1 [Accuracy    89.3529%]
classify-20newsgroups.sh->2 [Accuracy    90.8317%]
classify-20newsgroups.sh->3 [Accuracy    76.2746%]
classify-20newsgroups.sh->4 [cleans up] 

cluster-reuters.sh->1 [20 clusters]  -kmeans
cluster-reuters.sh->2 [INFO: 20 clusters]  -fkmeans
cluster-reuters.sh->3 [OK]  -lda
cluster-reuters.sh->4 [10 (9) clusters- see attached]  -streaming kmeans

./cluster-syntheticcontrol.sh->1 [INFO: Wrote 6 clusters]
./cluster-syntheticcontrol.sh->2 [INFO: Wrote 6 clusters]
./cluster-syntheticcontrol.sh->3 [INFO: Wrote 6 clusters]

./factorize-movielens-1M.sh /home/andy/test_data/ml-1m/ratings.dat [RMSE is: 0.851264570339848]




Attached is full output of cluster-reuters.sh->4 Streaming K-Means.



>From cluster-reuters.sh->4 Streaming K-Means:

Cluster 0 is has 1 data point. Need atleast 2 data points in a cluster for OnlineSummarizer.
Average distance in cluster 1 [2816]: 3438.913758
Average distance in cluster 2 [112]: 20617.345993
Average distance in cluster 3 [4]: 32504.085379
Average distance in cluster 4 [435]: 18476.579935
Average distance in cluster 5 [27]: 21153.167574
Average distance in cluster 6 [15480]: 2040.864416
Average distance in cluster 7 [1711]: 5281.742482
Average distance in cluster 8 [964]: 15762.976239
Average distance in cluster 9 [28]: 19762.109632
Num clusters: 10; maxDistance: 107106.379648




[Dunn Index] First: 0.002272
[Davies-Bouldin Index] First: 57.871266
Jan 22, 2014 12:14:47 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 77695 ms (Minutes: 1.2949166666666667)
cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
1,3438.913758,2430.072640,250.635051,1793.254765,2908.356638,4444.702564,22173.892767,2816,train
2,20617.345993,3978.577827,-8306.835555,17787.685767,19584.319120,22864.637511,37305.829397,112,train
3,32504.085379,29250.558538,-12174.296092,12174.296092,36522.888276,26372.137172,107106.379648,4,train
4,18476.579935,3600.742072,-7212.729374,15841.995992,17431.838259,20066.610494,40205.090209,435,train
5,21153.167574,4963.661797,-8880.583978,19729.348269,21251.400944,24588.743549,27926.248558,27,train
6,2040.864416,2007.719699,53.622493,841.033934,1571.121917,2396.407672,18967.768820,15480,train
7,5281.742482,3083.071478,1933.759989,3216.929268,4074.689928,6371.577109,20292.193673,1711,train
8,15762.976239,3158.956443,65.031208,13511.867700,14744.029626,17287.006957,31483.809655,964,train
9,19762.109632,4355.120345,-8902.814641,18669.317253,20712.227220,21602.660490,27452.910312,28,train




> From: ap.dev@outlook.com
> To: dev@mahout.apache.org; user@mahout.apache.org
> Subject: RE: MAHOUT 0.9 Release - New URL
> Date: Wed, 22 Jan 2014 09:37:06 -0500
> 
> will do!
> 
> > Date: Wed, 22 Jan 2014 01:24:05 -0800
> > From: suneel_marthi@yahoo.com
> > Subject: Re: MAHOUT 0.9 Release - New URL
> > To: dev@mahout.apache.org; user@mahout.apache.org
> > 
> > Andrew M., Andrew P. and others,
> > 
> > Sebastian and me fixed a few issues today (for 0.9):
> > 
> > a) Removed asf-email-examples.sh script and few other scripts that should have been removed. Also removed references/invocations to algorithms that have been removed from the codebase.
> > b) Fixed the issue with Streaming Kmeans clustering and checked in the code.  
> > c) Resurrected Frequent Pattern Mining implementation for 0.9.
> > 
> > Please checkout the latest code from trunk, run a build locally and run thru the example scripts. 
> > 
> > Thanks and Regards.
> > 
> > 
> > 
> > 
> > 
> > 
> > On Wednesday, January 22, 2014 12:11 AM, Andrew Musselman <an...@gmail.com> wrote:
> >  
> > *factorize-movielens-1M.sh:*
> > RMSE is:
> > 
> > 0.8519064098265133
> > 
> > 
> > Sample recommendations:
> > 
> > 2229
> > [2197:4.4961276,527:4.4309845,2972:4.4180074,318:4.379484,572:4.312696,3092:4.249903]
> > 5848
> > [1900:4.6775646,3787:4.6623707,632:4.641377,2609:4.608225,3808:4.6058936,2998:4.6057487]
> > 3728
> > [572:4.951382,47:4.748921,874:4.6945343,1563:4.679901,3314:4.6621537,50:4.655838]
> > 1252
> > [53:5.0,3816:4.9664702,3077:4.9494777,213:4.94007,3808:4.9060082,978:4.8568053]
> > 634
> > [572:5.0,3092:4.779557,1872:4.72024,2687:4.629712,2125:4.615142,3853:4.5261393]
> > 5516    [572:5.0,2197:5.0,3092:5.0,318:4.908213,356:4.885,3844:4.8237453]
> > 2276    [1204:5.0,572:5.0,912:5.0,1250:5.0,1272:4.999891,1262:4.989652]
> > 4219
> > [53:4.8112006,598:4.775032,858:4.761604,572:4.7579737,1219:4.680987,1221:4.6604886]
> > 91      [1198:5.0,2762:5.0,1207:5.0,1234:5.0,318:5.0,260:5.0]
> > 502
> > [953:5.0,260:4.9800477,1234:4.869403,1198:4.8527064,1207:4.8497486,3469:4.847286]
> > 
> > factorize-netflix.sh:
> > References a no-longer-available data set that Netflix took down after the
> > competition; should at least mention that the data set is no longer
> > "online" at least.
> > 
> > 
> > On Tue, Jan 21, 2014 at 8:05 PM, Andrew Musselman <
> > andrew.musselman@gmail.com> wrote:
> > 
> > > *clustering-syntheticcontrol.sh*
> > >
> > > *Canopy:*
> > > [snip]
> > >         1.0 : [distance-squared=1740.681000315628]: [35.486, 25.600,
> > > 29.914, 30.200, 27.654, 28.647, 26.582, 32.941, 34.728, 29.047, 34.706,
> > > 34.764, 28.816, 30.271, 25.784, 25.035, 35.436, 29.075, 24.267, 24.625,
> > > 28.162, 26.218, 28.773, 28.966, 28.802, 34.292, 26.211, 33.363, 32.920,
> > > 31.891, 34.504, 32.686, 24.327, 35.981, 31.390, 10.832, 20.238, 10.051,
> > > 14.877, 10.570, 19.603, 14.544, 10.667, 16.470, 19.007, 10.352, 13.473,
> > > 12.196, 10.684, 16.620, 20.434, 17.069, 18.744, 9.599, 11.195, 12.002,
> > > 10.017, 17.149, 14.850, 10.890]
> > >         1.0 : [distance-squared=1455.363773097357]: [31.022, 28.140,
> > > 26.730, 26.570, 29.561, 26.966, 28.049, 25.673, 33.721, 26.275, 30.410,
> > > 31.101, 24.019, 35.659, 25.253, 25.932, 28.618, 32.423, 33.666, 33.745,
> > > 35.118, 29.164, 25.477, 31.947, 35.491, 30.730, 25.820, 24.651, 25.528,
> > > 31.343, 29.005, 31.825, 26.891, 28.194, 31.429, 16.935, 8.070, 16.604,
> > > 14.743, 10.342, 8.155, 10.395, 17.689, 16.791, 14.138, 15.761, 6.787,
> > > 13.062, 16.660, 15.021, 9.891, 9.216, 11.550, 8.877, 18.220, 9.477, 10.342,
> > > 16.430, 11.898, 15.366]
> > >         1.0 : [distance-squared=1679.9304895378882]: [29.625, 25.503,
> > > 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> > > 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> > > 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> > > 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> > > 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> > > 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> > > 15.285, 22.528, 20.657, 24.129]
> > >         1.0 : [distance-squared=2044.2887801683828]: [27.414, 25.397,
> > > 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> > > 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> > > 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> > > 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> > > 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> > > 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> > > 20.229, 11.131, 9.980, 10.720]
> > >         1.0 : [distance-squared=1385.3154063160764]: [35.899, 26.672,
> > > 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> > > 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> > > 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> > > 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> > > 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> > > 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> > > 16.546, 15.927, 18.084, 17.475]
> > >         1.0 : [distance-squared=1920.6376615603585]: [24.538, 24.280,
> > > 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> > > 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> > > 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> > > 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> > > 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> > > 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> > > 19.310, 12.999, 17.460]
> > >         1.0 : [distance-squared=2192.939571172661]: [34.335, 30.938,
> > > 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> > > 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> > > 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> > > 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> > > 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> > > 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> > > 11.743, 11.699, 10.152]
> > > Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Wrote 6 clusters
> > > Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Program took 5827 ms (Minutes: 0.09711666666666667)
> > >
> > > *K-means:*
> > > [snip]
> > >         1.0 : [distance-squared=2873.881301031739]: [26.369, 37.791,
> > > 41.839, 39.694, 36.728, 35.079, 30.668, 24.755, 20.610, 18.885, 15.459,
> > > 14.074, 19.117, 34.230, 32.202, 40.715, 39.543, 37.193, 39.448, 30.829,
> > > 25.273, 22.324, 19.801, 13.505, 17.462, 24.287, 27.398, 38.577, 42.108,
> > > 41.947, 43.987, 41.331, 28.958, 23.664, 20.308, 22.244, 12.149, 15.768,
> > > 18.964, 32.579, 33.202, 36.205, 42.364, 40.601, 35.415, 39.576, 33.145,
> > > 19.936, 23.062, 19.053, 24.383, 19.611, 25.218, 38.860, 36.570, 38.964,
> > > 36.946, 36.900, 32.593, 31.563]
> > >         1.0 : [distance-squared=2525.8924241648783]: [35.389, 31.178,
> > > 40.041, 43.034, 49.524, 40.942, 42.369, 30.153, 26.253, 23.178, 19.885,
> > > 19.693, 21.837, 26.858, 23.533, 29.798, 43.401, 43.349, 48.238, 43.868,
> > > 44.017, 28.056, 25.284, 22.737, 13.703, 14.023, 22.115, 23.720, 25.478,
> > > 37.870, 37.868, 46.954, 43.135, 41.286, 37.269, 29.347, 24.312, 21.743,
> > > 19.152, 21.668, 10.882, 16.147, 30.020, 28.472, 38.716, 44.620, 47.847,
> > > 48.846, 39.361, 38.449, 33.105, 19.935, 14.961, 12.149, 12.630, 13.459,
> > > 18.967, 35.473, 30.146, 45.527]
> > >         1.0 : [distance-squared=2392.7171990886272]: [27.662, 37.199,
> > > 39.158, 44.264, 46.473, 40.178, 38.728, 24.412, 25.155, 15.938, 13.125,
> > > 16.867, 13.875, 29.969, 34.322, 40.870, 44.225, 47.594, 47.607, 44.751,
> > > 39.813, 32.461, 16.646, 16.305, 21.256, 20.627, 19.267, 31.901, 34.995,
> > > 43.643, 43.152, 47.125, 48.524, 33.131, 32.697, 20.300, 14.350, 18.330,
> > > 12.155, 11.261, 19.701, 29.313, 30.457, 39.914, 49.543, 49.851, 46.445,
> > > 39.561, 28.860, 22.486, 13.967, 10.006, 11.715, 15.945, 21.348, 25.439,
> > > 41.024, 37.105, 45.623, 43.589]
> > >         1.0 : [distance-squared=1419.8378244373016]: [25.784, 34.129,
> > > 42.659, 37.176, 35.961, 34.307, 32.108, 29.749, 25.047, 17.455, 24.652,
> > > 25.311, 22.995, 30.256, 25.955, 28.426, 34.556, 40.386, 39.642, 40.566,
> > > 32.612, 34.091, 26.033, 25.668, 26.545, 17.338, 24.980, 20.134, 27.542,
> > > 36.612, 31.855, 37.947, 39.736, 33.535, 36.607, 37.479, 32.612, 22.262,
> > > 20.662, 16.124, 24.547, 27.686, 21.747, 27.198, 31.259, 40.569, 37.067,
> > > 34.465, 34.730, 33.371, 23.060, 30.162, 22.022, 22.216, 14.812, 19.357,
> > > 24.508, 34.432, 32.155, 34.839]
> > >         1.0 : [distance-squared=4186.814512311335]: [25.870, 39.195,
> > > 36.908, 47.052, 47.384, 40.741, 42.494, 30.282, 25.834, 17.650, 16.004,
> > > 17.895, 13.321, 19.045, 27.440, 31.911, 39.208, 43.622, 41.567, 44.815,
> > > 44.921, 35.422, 35.477, 23.190, 17.859, 14.684, 23.504, 23.141, 21.746,
> > > 30.816, 31.361, 37.015, 38.094, 46.688, 47.681, 43.777, 39.652, 31.701,
> > > 23.767, 22.265, 22.654, 22.327, 19.195, 21.163, 29.602, 27.563, 36.244,
> > > 38.859, 44.234, 42.352, 42.160, 40.172, 30.094, 21.092, 25.193, 13.096,
> > > 18.111, 14.754, 27.386, 27.026]
> > >         1.0 : [distance-squared=1544.4011543572997]: [28.075, 41.784,
> > > 42.120, 38.735, 44.320, 34.316, 32.212, 31.868, 24.301, 14.547, 17.178,
> > > 22.279, 24.357, 31.011, 31.444, 34.837, 46.550, 48.301, 38.859, 42.363,
> > > 35.657, 31.499, 19.794, 12.124, 15.371, 12.436, 15.763, 24.679, 32.597,
> > > 43.004, 36.616, 38.935, 42.954, 34.957, 36.183, 28.177, 16.326, 14.988,
> > > 10.680, 22.728, 24.075, 24.058, 36.616, 43.982, 39.198, 40.118, 40.078,
> > > 34.752, 34.018, 23.750, 18.374, 12.251, 15.539, 18.699, 28.973, 24.044,
> > > 39.404, 38.034, 46.458, 44.432]
> > >         1.0 : [distance-squared=825.9338725427806]: [33.670, 38.675,
> > > 39.742, 41.989, 37.291, 43.975, 31.909, 25.878, 31.080, 15.858, 13.950,
> > > 23.097, 19.983, 21.692, 31.579, 38.570, 33.376, 38.843, 41.936, 33.534,
> > > 39.195, 32.897, 25.343, 18.523, 15.089, 17.771, 22.614, 25.313, 23.687,
> > > 29.010, 41.995, 35.712, 40.872, 41.669, 32.156, 25.162, 24.980, 23.705,
> > > 18.413, 20.975, 14.906, 26.171, 30.165, 27.818, 35.083, 39.514, 37.851,
> > > 33.967, 32.338, 34.977, 26.589, 28.079, 19.597, 24.669, 23.098, 25.685,
> > > 28.215, 34.940, 36.910, 39.749]
> > > Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Wrote 6 clusters
> > > Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Program took 16902 ms (Minutes: 0.2817)
> > >
> > > *Fuzzy k-means:*
> > > [snip]
> > >         1.0 : [distance-squared=971.7369782121968]: [29.625, 25.503,
> > > 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> > > 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> > > 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> > > 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> > > 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> > > 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> > > 15.285, 22.528, 20.657, 24.129]
> > >         1.0 : [distance-squared=2054.618163154475]: [27.414, 25.397,
> > > 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> > > 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> > > 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> > > 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> > > 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> > > 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> > > 20.229, 11.131, 9.980, 10.720]
> > >         1.0 : [distance-squared=954.6503560728597]: [35.899, 26.672,
> > > 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> > > 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> > > 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> > > 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> > > 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> > > 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> > > 16.546, 15.927, 18.084, 17.475]
> > >         1.0 : [distance-squared=2817.9170498632957]: [24.538, 24.280,
> > > 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> > > 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> > > 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> > > 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> > > 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> > > 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> > > 19.310, 12.999, 17.460]
> > >         1.0 : [distance-squared=3472.3684696871424]: [34.335, 30.938,
> > > 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> > > 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> > > 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> > > 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> > > 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> > > 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> > > 11.743, 11.699, 10.152]
> > > Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Wrote 6 clusters
> > > Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Program took 106615 ms (Minutes: 1.7769166666666667)
> > >
> > > *Dirichlet and Meanshift:*
> > > Already detailed in M-1400, deprecated jobs still referenced.
> > >
> > >
> > >
> > > On Tue, Jan 21, 2014 at 6:20 PM, Andrew Musselman <
> > > andrew.musselman@gmail.com> wrote:
> > >
> > >> *cluster-reuters.sh*
> > >> *k-means:*
> > >>
> > >> [snip]
> > >> :VL-19482{n=913 c=[0.06:0.011, 0.1:0.007, 0.13:0.010, 0.25:0.016,
> > >> 0.38:0.020, 0.4:0.007, 0.5:0.032, 0
> > >>         Top Terms:
> > >>                 banks                                   =>
> > >> 3.841823268955143
> > >>                 bank                                    =>
> > >>  3.80633066361209
> > >>                 debt                                    =>
> > >>  3.28065219870794
> > >>                 said                                    =>
> > >>  2.5965700942088583
> > >>                 he                                      =>
> > >> 2.335682813857497
> > >>                 foreign                                 =>
> > >>  2.2217853688201403
> > >>                 billion                                 =>
> > >>  2.1970193848291335
> > >>                 would                                   =>
> > >>  1.9932392063955617
> > >>                 loans                                   =>
> > >>  1.9309276792854233
> > >>                 interest                                =>
> > >>  1.787324501938
> > >>                 have                                    =>
> > >> 1.762981951432578
> > >>                 its                                     =>
> > >>  1.7615109954971866
> > >>                 which                                   =>
> > >>  1.5822081148036862
> > >>                 has                                     =>
> > >>  1.5600708189041956
> > >>                 dlrs                                    =>
> > >>  1.5571038313005996
> > >>                 finance                                 =>
> > >>  1.5539758811252924
> > >>                 new                                     =>
> > >>  1.5176015811577555
> > >>                 had                                     =>
> > >>  1.5138723701401844
> > >>                 brazil                                  =>
> > >>  1.5083369853593172
> > >>                 payments                                =>
> > >>  1.4539044255886517
> > >>         Weight : [props - optional]:  Point:
> > >>
> > >> :VL-7320{n=2726 c=[0:0.003, 0.1:0.010, 0.2:0.007, 0.3:0.009, 0.4:0.007,
> > >> 0.40:0.003, 0.5:0.009, 0.57:0
> > >>         Top Terms:
> > >>                 vs                                      =>
> > >> 6.126130791333171
> > >>                 net                                     =>
> > >> 4.012191567277523
> > >>                 cts                                     =>
> > >> 3.822006848832744
> > >>                 shr                                     =>
> > >>  3.6786004856764527
> > >>                 mln                                     =>
> > >>  2.9011643584038698
> > >>                 loss                                    =>
> > >> 2.788368861463607
> > >>                 qtr                                     =>
> > >> 2.714140225051522
> > >>                 revs                                    =>
> > >>  2.4739861236454717
> > >>                 profit                                  =>
> > >>  1.8146888090247015
> > >>                 note                                    =>
> > >>  1.7977163272138388
> > >>                 dlrs                                    =>
> > >>  1.6164390808155846
> > >>                 avg                                     =>
> > >>  1.3901765773336587
> > >>                 shrs                                    =>
> > >>  1.3856326531419314
> > >>                 mths                                    =>
> > >>  1.3168717272038506
> > >>                 4th                                     =>
> > >>  1.2161158425617289
> > >>                 oper                                    =>
> > >> 1.182419473776814
> > >>                 year                                    =>
> > >> 1.178086061733047
> > >>                 nine                                    =>
> > >>  1.0670554836445316
> > >>                 3rd                                     =>
> > >> 1.041334410056592
> > >>                 inc                                     =>
> > >>  1.0019361981554935
> > >>         Weight : [props - optional]:  Point:
> > >>
> > >>
> > >> Inter-Cluster Density: 0.45562152681859414
> > >> Intra-Cluster Density: 0.6952712632167628
> > >> CDbw Inter-Cluster Density: 0.0
> > >> CDbw Intra-Cluster Density: 16.486930227598684
> > >> CDbw Separation: 194.49005884464628
> > >>
> > >> *fuzzy k-means:*
> > >> :SV-18539{n=1039 c=[0:0.026, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> > >> 0.01:0.005, 0.02:0.002, 0.0
> > >>         Top Terms:
> > >>                 said                                    =>
> > >>  1.8665592354713065
> > >>                 its                                     =>
> > >>  1.1335212213411592
> > >>                 pct                                     =>
> > >>  1.0862816801353348
> > >>                 dlrs                                    =>
> > >>  1.0854998884993752
> > >>                 mln                                     =>
> > >> 1.043163996400643
> > >>                 from                                    =>
> > >>  0.9684961110525736
> > >>                 has                                     =>
> > >> 0.912161511978058
> > >>                 company                                 =>
> > >>  0.8754186972808333
> > >>                 mar                                     =>
> > >>  0.8675333452422878
> > >>                 inc                                     =>
> > >>  0.7678617590362815
> > >>                 would                                   =>
> > >>  0.7610968883652675
> > >>                 he                                      =>
> > >>  0.7459988770503974
> > >>                 which                                   =>
> > >>  0.7435613119406804
> > >>                 year                                    =>
> > >>  0.7302840632748394
> > >>                 u.s                                     =>
> > >>  0.7281061062439116
> > >>                 shares                                  =>
> > >>  0.7260764102983083
> > >>                 corp                                    =>
> > >>  0.7179807367808658
> > >>                 new                                     =>
> > >>  0.7044203783157115
> > >>                 stock                                   =>
> > >>  0.6962010978721442
> > >>                 have                                    =>
> > >>  0.6464265467298506
> > >> :SV-9431{n=1034 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> > >> 0.01:0.004, 0.02:0.002, 0.02
> > >>         Top Terms:
> > >>                 said                                    =>
> > >> 1.864911184196927
> > >>                 dlrs                                    =>
> > >> 1.199286689822081
> > >>                 mln                                     =>
> > >>  1.1802134783562215
> > >>                 pct                                     =>
> > >>  1.1529704214798124
> > >>                 its                                     =>
> > >>  1.1184398851519701
> > >>                 from                                    =>
> > >> 1.016647848050332
> > >>                 company                                 =>
> > >> 0.894703604722841
> > >>                 mar                                     =>
> > >> 0.879986159541356
> > >>                 has                                     =>
> > >>  0.8642799128491316
> > >>                 year                                    =>
> > >>  0.8271823503717782
> > >>                 inc                                     =>
> > >>  0.7871293745341424
> > >>                 corp                                    =>
> > >> 0.737705498468879
> > >>                 which                                   =>
> > >> 0.722975201852743
> > >>                 would                                   =>
> > >> 0.708000816484415
> > >>                 u.s                                     =>
> > >>  0.7073294276173905
> > >>                 billion                                 =>
> > >>  0.7055723996916351
> > >>                 he                                      =>
> > >>  0.7042684217823294
> > >>                 new                                     =>
> > >>  0.6834737905434939
> > >>                 shares                                  =>
> > >>  0.6753327384172428
> > >>                 stock                                   =>
> > >>  0.6576225144041699
> > >> :SV-4785{n=1044 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> > >> 0.01:0.006, 0.02:0.002, 0.02
> > >>         Top Terms:
> > >>                 said                                    =>
> > >>  1.8796076179735086
> > >>                 its                                     =>
> > >> 1.172025965452378
> > >>                 dlrs                                    =>
> > >> 1.130422792460914
> > >>                 pct                                     =>
> > >> 1.082038255241358
> > >>                 mln                                     =>
> > >>  1.0772146872767114
> > >>                 company                                 =>
> > >>  0.9662235879639138
> > >>                 from                                    =>
> > >>  0.9473172871605616
> > >>                 has                                     =>
> > >>  0.9224712965830099
> > >>                 mar                                     =>
> > >>  0.8769325856924421
> > >>                 inc                                     =>
> > >>  0.8360245257169788
> > >>                 shares                                  =>
> > >>  0.8334595641384324
> > >>                 stock                                   =>
> > >>  0.7704621839612175
> > >>                 corp                                    =>
> > >>  0.7682400250301806
> > >>                 which                                   =>
> > >>  0.7389988207856137
> > >>                 would                                   =>
> > >>  0.7339708917389389
> > >>                 year                                    =>
> > >>  0.7088414843731325
> > >>                 new                                     =>
> > >>  0.7038109468655172
> > >>                 he                                      =>
> > >>  0.6993994455501005
> > >>                 u.s                                     =>
> > >>  0.6772649147622415
> > >>                 share                                   =>
> > >>  0.6241804830055171
> > >>
> > >> *lda:*
> > >>
> > >> [snip]
> > >> 21539
> > >> {0.02:0.7071698119320297,0.01:0.09185661419250732,0.055:0.05559716236030052,0:0.04416624482186779,0.046:0.04110903741109505,0.10:0.036664417789754995,0.073:0.009543544335363714,0.006913:0.0050293341354450535,0.007050:0.004031353935388081,0.003:0.0019163134919350053}
> > >> 21540
> > >> {0.04:0.4289840457590362,0.006913:0.3764552842292379,0.03:0.14201700033942147,0.025:0.031492533562460345,0.01:0.0057428959027208,0.0625:0.005563615996288134,0.007050:0.004410121345864598,0.02:0.003315679992182833,0.06:0.0010680133665222197,0.057:2.6908116668663575E-4}
> > >> 21541
> > >> {0:0.6323790020346274,0.02:0.282038438551802,0.06:0.05929917592183956,0.046:0.021922159161082488,0.025:0.003123559949176154,0.1:6.163861804777446E-4,0.057:1.1513607281652563E-4,0.077:7.546641269535658E-5,0.05:6.020069105221075E-5,0.04:5.727546417602751E-5}
> > >> 21542
> > >> {0.06:0.7916202902993545,0.003:0.12474538653424426,0.03:0.0516187501990535,0.073:0.013078206873952192,0.077:0.006874558859394474,0.1:0.004726257215175674,0.07:0.0029954110394770084,0.02:0.0015535221634909381,0.0625:8.057802584933225E-4,0.006913:5.349608183182145E-4}
> > >> 21543
> > >> {0.03:0.8557703918728101,0.007050:0.0884696514495358,0.01:0.02386217742025169,0.055:0.01794750983209314,0.046:0.007244240662980594,0.02:0.0025021610305199344,0.1:0.0011795344808501283,0.0625:9.555718731181631E-4,0.077:8.384080940465597E-4,0.003:2.4480111447900804E-4}
> > >> 21544
> > >> {0.006913:0.6497351267772595,0.03:0.2586364130498913,0.003:0.0845308091084703,0.05:0.003488921831506818,0.0625:0.0011991059579690732,0.06:5.301659535652308E-4,0.055:3.152013670552699E-4,0.025:3.11949418681945E-4,0.02:2.2346509541410544E-4,0.057:1.727741085612565E-4}
> > >> 21545
> > >> {0.07:0.24937001680778675,0.03:0.22854680579732564,0.1:0.22068070811382484,0.05:0.21891358916881656,0.06:0.04971205462818302,0.046:0.029384231252419384,0.025:0.0014118858333346275,0.0625:0.0012321257282372393,0.073:3.1606402052550454E-4,0.04:1.1662985389192366E-4}
> > >> 21546
> > >> {0.006913:0.652435612545713,0.073:0.24282539253890825,0.003:0.055020663101050894,0.046:0.01978650831133704,0.04:0.011699600233998459,0.02:0.006822518565048025,0.05:0.0042733514779345234,0.025:0.0020768247329981497,0.03:0.0015466188524926666,0:0.0010433788491090747}
> > >> 21547
> > >> {0.1:0.935072008706917,0.077:0.040616382672055494,0.046:0.023138054104883225,0.06:3.42252302356326E-4,0.03:2.672247473681925E-4,0.007050:9.004419280457053E-5,0.006913:7.844271097106069E-5,0.04:7.198835322717653E-5,0.073:5.8017696474244885E-5,0.02:4.474316852571109E-5}
> > >> 21548
> > >> {0.006913:0.4510141547325999,0.03:0.21472666934984877,0.07:0.15060570743672352,0.046:0.08216181853028293,0.05:0.07498297963542139,0.077:0.01401215532342401,0.04:0.006135722806477439,0.073:0.0031124469556872442,0.02:0.001327252389650958,0.025:4.5167404646311124E-4}
> > >> 21549
> > >> {0.077:0.5249260290096315,0:0.27520186965742544,0.073:0.08959570186504386,0.057:0.05063809804337512,0.02:0.045313417578343,0.03:0.00856024297303885,0.06:0.0034039136814002697,0.07:8.611323331226122E-4,0.05:3.9387255234958607E-4,0.04:3.454752498776842E-4}
> > >> 21550
> > >> {0.077:0.8319708524327014,0.007050:0.16001430652201168,0.0625:0.0024884528530674226,0.03:0.0024228964094551416,0.006913:0.0010048446080994332,0.025:5.974607168723063E-4,0.06:2.7741593377792194E-4,0.057:2.2975316663437597E-4,0.073:2.1769059479546598E-4,0.01:1.4945234676270913E-4}
> > >> 21551
> > >> {0:0.43298549949426596,0.07:0.30407459462158964,0.0625:0.15744077685124136,0.01:0.06385977925647884,0.025:0.020205557109867888,0.04:0.013226123091680062,0.055:0.006501562964287696,0.073:5.871897881404074E-4,0.003:4.554836644848264E-4,0.05:2.0489216962221778E-4}
> > >> 21552
> > >> {0.06:0.7383744333942458,0.02:0.07321126682102753,0.07:0.05910378841288311,0.10:0.056907223730939045,0:0.02739559786902668,0.055:0.02228913751272657,0.1:0.00943274247398869,0.073:0.007301445750018608,0.03:0.0027711985062277246,0.046:0.0022569760697531112}
> > >> 21553
> > >> {0.006913:0.9383779606743132,0.07:0.018356074893823605,0.1:0.017572502072403694,0.025:0.011031848455352145,0.02:0.007110054905474798,0.10:0.0032390933761145377,0.01:0.0022676909091530165,0.06:6.35712654913703E-4,0.003:3.037608224814005E-4,0.0625:2.569989005512836E-4}
> > >> 21554
> > >> {0.02:0.4740260172915081,0.03:0.3264871353578654,0.003:0.12383071192450323,0:0.0387913487693229,0.046:0.013922363892934853,0.01:0.01115832101415319,0.055:0.004438529958216764,0.006913:0.0031112721201723413,0.025:0.002399946628810436,0.073:6.131643125523258E-4}
> > >> 21555
> > >> {0.01:0.42396888624053325,0.057:0.2876207419392007,0.055:0.13436747721404915,0.10:0.0923627978966975,0.05:0.030530372962547347,0.003:0.008404234144369286,0.006913:0.007605013189108045,0.007050:0.005108825532219537,0.0625:0.004651573366090424,0.025:0.002180181204608079}
> > >> 21556
> > >> {0.02:0.9634685201058687,0.077:0.02816791882994464,0.003:0.007368001305747207,0.01:4.6299664334060455E-4,0.055:6.358811991095171E-5,0.06:5.5240076581247115E-5,0.03:5.231350505975146E-5,0.073:4.4773553407989E-5,0:3.6810457882911134E-5,0.006913:3.469265861451538E-5}
> > >> 21557
> > >> {0.06:0.9993947051352264,0.05:2.53296407549323E-4,0.03:9.647254237540585E-5,0.04:4.3439270433017595E-5,0.006913:3.354681491698619E-5,0.046:2.8586727566416525E-5,0.007050:2.5584606074089293E-5,0.02:2.2477243875316502E-5,0:1.9194783598477564E-5,0.073:1.6900267481892075E-5}
> > >> 21558
> > >> {0.06:0.9985073788555696,0.03:8.033091100049725E-4,0.02:1.1953766739610202E-4,0.1:9.257610652171745E-5,0.006913:6.903783269939522E-5,0.04:5.7960967280609926E-5,0.07:5.5767050956214925E-5,0.046:4.857027337508731E-5,0.007050:3.5964741541364354E-5,0.057:3.538185773175377E-5}
> > >> 21559
> > >> {0.006913:0.5411158302162348,0.073:0.10595054605908563,0.04:0.10013413310674449,0.007050:0.08891690362990352,0.003:0.08028744789933502,0.03:0.07779866329563544,0.025:0.0024314950846438975,0.0625:0.0018111845757907532,0.077:5.019763638644379E-4,0.055:2.6705221435486376E-4}
> > >> 21560
> > >> {0.06:0.9978884964462115,0.03:9.139179655096763E-4,0.02:6.066525825847198E-4,0.04:3.2209695809936266E-4,0.006913:7.970816960964983E-5,0.007050:5.58984012266038E-5,0.046:1.545206468939752E-5,0.077:1.5319891034946036E-5,0:1.5223193391279898E-5,0.073:1.1687759018690983E-5}
> > >> 21561
> > >> {0.06:0.8704787441462888,0.007050:0.09131090501970876,0.0625:0.03778566273654969,0.046:8.111987924033587E-5,0.01:6.397000919080148E-5,0.1:4.979138875277178E-5,0.073:2.7942853854174412E-5,0.03:2.762357950066146E-5,0.077:2.6820742114533983E-5,0.025:2.6208487691114472E-5}
> > >> 21562
> > >> {0.06:0.9625603310705717,0.02:0.01708933716171641,0.007050:0.011374975845817934,0.07:0.00482114999912725,0.057:0.003310531318631415,0.077:2.568457462720504E-4,0.025:1.3439876933758153E-4,0.1:1.0512433283405881E-4,0:8.737443941062146E-5,0.046:5.9284232059258864E-5}
> > >> 21563
> > >> {0.06:0.9996809273982157,0.046:1.6325482120709976E-4,0.02:2.1084598024339765E-5,0.006913:1.7901242019979392E-5,0.04:1.3991162886383784E-5,0.03:1.3682157013015017E-5,0.077:1.3602492456590212E-5,0.007050:1.206712606141859E-5,0.1:1.0231842284246997E-5,0.057:7.704725064510759E-6}
> > >> 21564
> > >> {0.06:0.9997861963976675,0.03:2.7140899197995276E-5,0.077:2.4456158311423386E-5,0.04:1.7952818233846462E-5,0.02:1.6334455815684533E-5,0.006913:1.4910843270220926E-5,0.073:1.4893553937733922E-5,0.1:1.2545291899719683E-5,0.007050:1.2337617858874285E-5,0.07:1.121302251254011E-5}
> > >> 21565
> > >> {0.006913:0.5047832315246878,0.007050:0.2502505818382197,0.04:0.09937533960784072,0.03:0.05332716291468396,0.0625:0.035738976624857435,0.05:0.023139962103851885,0.1:0.01510786357969295,0.025:0.01294816540331917,0.06:0.003589347327961106,0.073:4.8607269023994543E-4}
> > >> 21566
> > >> {0.03:0.865579490292393,0.073:0.065013560785593,0.077:0.056622094108767465,0.046:0.006811842330071251,0.057:0.0021561477140846267,0.01:0.0013149375957061502,0.04:7.456782721333958E-4,0.05:6.412995789267404E-4,0:2.107021879325011E-4,0.007050:1.8121393989190674E-4}
> > >> 21567
> > >> {0.077:0.6528663315309344,0.03:0.2794089480653573,0.025:0.060214953606503134,0.003:0.0029851203917978303,0.01:0.002840784719750811,0.007050:5.508511345707982E-4,0.02:3.3395642786457786E-4,0.07:1.7452781529689483E-4,0.055:1.541344869853217E-4,0.046:1.0154945247629696E-4}
> > >> 21568
> > >> {0.057:0.7925855379581803,0.03:0.2036907633660934,0.06:0.002283924010657722,0.046:0.0011125177332923534,0.10:6.706442563331911E-5,0.02:3.9070227131596934E-5,0.07:3.266094677087569E-5,0.1:2.3436639268605713E-5,0.077:2.318067689954084E-5,0.006913:2.273173553155518E-5}
> > >> 21569
> > >> {0.06:0.7223525910216753,0.1:0.22280373045161775,0.04:0.04239924319412595,0.02:0.006529556700876843,0.007050:0.004661124794787862,0.10:2.017974034648702E-4,0.05:2.017516118028694E-4,0.025:1.583677755896652E-4,0.006913:1.1939663934259253E-4,0:8.736457986006156E-5}
> > >> 21570
> > >> {0.073:0.5321098304788365,0.006913:0.3897385574581158,0.02:0.051125703897749404,0.077:0.012417023254098358,0.01:0.006491518762040415,0.03:0.005434610908750246,0.055:9.352347087701305E-4,0.06:5.239363525283659E-4,0.10:4.4114444135088393E-4,0.003:2.2360302221231853E-4}
> > >> 21571
> > >> {0.06:0.9074255414695478,0.05:0.08971808812931319,0.02:0.0019124104766371694,0.1:2.581234320438502E-4,0.073:1.9041498001195312E-4,0.046:8.348355382104383E-5,0.006913:5.748062878632201E-5,0.057:4.8979196235823963E-5,0.04:4.4005272512340306E-5,0.03:4.1089625609562384E-5}
> > >> 21572
> > >> {0.06:0.7216888928389846,0.04:0.1955684645266304,0.006913:0.08235454447065854,0.1:9.82899034505965E-5,0.046:4.9184577303445956E-5,0.05:4.021965070167039E-5,0.007050:3.081280652634891E-5,0.073:2.791145099471127E-5,0.02:1.95681856192452E-5,0:1.9037190007395713E-5}
> > >> 21573
> > >> {0.05:0.8568971411565196,0.046:0.12909436895238377,0.06:0.012704234652048044,0.04:2.472784666357729E-4,0.055:2.1886698996361582E-4,0.1:1.1974451444757112E-4,0.0625:9.082357988309755E-5,0.07:9.030017229129562E-5,0.03:7.268045692763623E-5,0.073:6.607926928741721E-5}
> > >> 21574
> > >> {0.046:0.5619466458628039,0.006913:0.184782367089353,0.0625:0.09726566772972363,0.003:0.09534816862353344,0.02:0.015692927163565275,0.073:0.015492017672231727,0.01:0.01411218625979968,0.007050:0.01037341031640615,0.055:0.0020124341216292752,0:0.0013828685922332715}
> > >> 21575
> > >> {0.05:0.3167681189235041,0.06:0.2879333280436204,0.046:0.22584628506521745,0.003:0.15994241628395953,0.0625:0.006482994028630967,0.1:0.0016773104050919493,0.055:4.8725015996676173E-4,0.03:2.432893903382962E-4,0.025:1.6015914759364425E-4,0.006913:1.0021292427951807E-4}
> > >> 21576
> > >> {0.077:0.2407816981967022,0.003:0.18594668103110193,0.10:0.14951775492012523,0.0625:0.09241152906714677,0.007050:0.08997645163280943,0.057:0.07102865286733068,0.055:0.048046579920457584,0.05:0.03776387140040494,0.073:0.033355081179026046,0.006913:0.015980584385115525}
> > >> 21577
> > >> {0.06:0.9599074612361259,0.1:0.03694135499501186,0.04:0.0021941315048273186,0.057:4.7070027358666304E-4,0.02:1.8231606308803002E-4,0.0625:6.234942557920162E-5,0.006913:5.159011604129561E-5,0.046:3.9832812943124216E-5,0.007050:3.0380683715134534E-5,0.05:2.8920851352755496E-5}
> > >>
> > >>
> > >> *Streaming k-means:*
> > >>
> > >> [snip]
> > >> INFO: Number of Centroids: 0
> > >> Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
> > >> WARNING: job_local23982482_0001
> > >> java.lang.IllegalArgumentException: Must have nonzero number of training
> > >> and test vectors. Asked for %.1f %% of %d vectors for test
> > >> [10.000000149011612, 0]
> > >>         at
> > >> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
> > >>         at
> > >> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
> > >>         at
> > >> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
> > >>         at
> > >> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
> > >>         at
> > >> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
> > >>         at
> > >> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
> > >>         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
> > >>         at
> > >> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
> > >>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
> > >>         at
> > >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
> > >>
> > >> [snip]
> > >>
> > >> WARNING: No qualcluster.props found on classpath, will use command-line
> > >> arguments only
> > >> Num clusters: 0; maxDistance: 0.000000
> > >> [Dunn Index] First: Infinity
> > >> [Davies-Bouldin Index] First: NaN
> > >> Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
> > >> INFO: Program took 535 ms (Minutes: 0.008916666666666666)
> > >> cluster,distance.mean,distance.sd
> > >> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > >>
> > >>
> > >> On Tue, Jan 21, 2014 at 1:47 PM, Andrew Musselman <
> > >> andrew.musselman@gmail.com> wrote:
> > >>
> > >>> *classify-20newsgroups.sh*
> > >>>
> > >>> *Complementary naive bayes:*
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances          :      11207       98.9406%
> > >>> Incorrectly Classified Instances        :        120        1.0594%
> > >>> Total Classified Instances              :      11327
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a       b       c       d       e       f       g       h       i
> > >>> j       k       l       m       n       o       p       q       r       s
> > >>>    t        <--Classified as
> > >>> 475     0       0       1       0       0       0       0       0
> > >>> 0       0       0       0       0       1       0       1       0       0
> > >>>    0         |  478         a     = alt.atheism
> > >>> 0       597     1       1       0       1       1       0       0
> > >>> 0       0       1       0       2       1       0       0       0       0
> > >>>    0         |  605         b     = comp.graphics
> > >>> 0       1       620     3       0       1       0       0       0
> > >>> 0       0       1       0       0       1       0       0       0       0
> > >>>    0         |  627         c     = comp.os.ms-windows.misc
> > >>> 1       1       1       593     2       0       0       0       0
> > >>> 0       0       0       0       0       0       1       0       0       0
> > >>>    0         |  599         d     = comp.sys.ibm.pc.hardware
> > >>> 0       1       1       0       568     0       1       0       0
> > >>> 0       1       1       2       0       0       0       0       1       0
> > >>>    0         |  576         e     = comp.sys.mac.hardware
> > >>> 0       4       2       0       0       581     0       0       0
> > >>> 0       0       0       0       0       0       0       0       0       0
> > >>>    0         |  587         f     = comp.windows.x
> > >>> 0       0       0       1       2       0       571     3       0
> > >>> 0       1       1       4       1       0       0       0       0       0
> > >>>    0         |  584         g     = misc.forsale
> > >>> 0       0       0       1       0       0       0       589     1
> > >>> 0       0       1       1       0       0       0       0       0       0
> > >>>    0         |  593         h     = rec.autos
> > >>> 0       0       0       0       0       0       0       1       565
> > >>> 0       0       0       0       0       1       0       0       0       0
> > >>>    0         |  567         i     = rec.motorcycles
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 600     2       0       0       0       1       0       0       0       0
> > >>>    0         |  603         j     = rec.sport.baseball
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 1       584     0       0       0       0       0       0       0       0
> > >>>    0         |  585         k     = rec.sport.hockey
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       579     0       0       0       0       0       1       0
> > >>>    0         |  580         l     = sci.crypt
> > >>> 0       0       0       1       3       0       2       0       0
> > >>> 2       0       0       567     1       2       1       0       0       0
> > >>>    0         |  579         m     = sci.electronics
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       0       1       605     0       0       0       0       0
> > >>>    0         |  606         n     = sci.med
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       0       0       0       602     0       0       0       0
> > >>>    0         |  602         o     = sci.space
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       0       0       1       0       602     0       0       1
> > >>>    0         |  604         p     = soc.religion.christian
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       0       0       0       0       0       556     0       0
> > >>>    0         |  556         q     = talk.politics.mideast
> > >>> 0       0       1       0       0       0       0       0       0
> > >>> 0       0       1       0       0       1       0       0       568     0
> > >>>    0         |  571         r     = talk.politics.guns
> > >>> 11      0       0       0       0       0       0       0       0
> > >>> 1       0       0       0       1       3       8       1       4       338
> > >>>    2         |  369         s     = talk.religion.misc
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       1       0       0       0       1       0       3       4       0
> > >>>    447       |  456         t     = talk.politics.misc
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa                                       0.9806
> > >>> Accuracy                                   98.9406%
> > >>> Reliability                                94.0932%
> > >>> Reliability (standard deviation)            0.2163
> > >>>
> > >>> Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 15870 ms (Minutes: 0.2645)
> > >>> + echo 'Testing on holdout set'
> > >>> Testing on holdout set
> > >>> + ./bin/mahout testnb -i /tmp/mahout-work-ec2-user/20news-test-vectors
> > >>> -m /tmp/mahout-work-ec2-user/model -l /tmp/mahout-work-ec2-user/labelindex
> > >>> -ow -o /tmp/mahout-work-ec2-user/20news-testing -c
> > >>>
> > >>> [snip]
> > >>>
> > >>> INFO: Complementary Results:
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances          :       6715       89.3071%
> > >>> Incorrectly Classified Instances        :        804       10.6929%
> > >>> Total Classified Instances              :       7519
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a       b       c       d       e       f       g       h       i
> > >>> j       k       l       m       n       o       p       q       r       s
> > >>>    t        <--Classified as
> > >>> 298     0       0       0       0       0       0       0       0
> > >>> 1       0       0       0       1       2       5       1       0       13
> > >>>     0         |  321         a     = alt.atheism
> > >>> 0       298     11      6       1       12      2       2       1
> > >>> 1       3       8       3       4       2       4       1       4       4
> > >>>    1         |  368         b     = comp.graphics
> > >>> 1       17      286     16      4       9       6       3       2
> > >>> 0       1       0       1       7       1       0       2       1       0
> > >>>    1         |  358         c     = comp.os.ms-windows.misc
> > >>> 2       6       11      309     9       5       14      8       1
> > >>> 0       2       0       6       4       2       0       1       2       1
> > >>>    0         |  383         d     = comp.sys.ibm.pc.hardware
> > >>> 0       10      8       7       334     7       5       5       2
> > >>> 0       3       0       2       1       1       0       1       1       0
> > >>>    0         |  387         e     = comp.sys.mac.hardware
> > >>> 1       13      7       8       2       355     2       0       2
> > >>> 0       0       5       1       1       3       0       0       1       0
> > >>>    0         |  401         f     = comp.windows.x
> > >>> 0       7       11      29      12      9       268     16      8
> > >>> 4       3       2       6       4       2       1       3       1       2
> > >>>    3         |  391         g     = misc.forsale
> > >>> 0       1       0       0       3       0       7       362     8
> > >>> 2       2       1       2       0       2       0       1       2       0
> > >>>    4         |  397         h     = rec.autos
> > >>> 0       0       0       1       0       0       1       0       423
> > >>> 0       0       0       2       1       0       1       0       0       0
> > >>>    0         |  429         i     = rec.motorcycles
> > >>> 0       0       1       0       0       0       0       2       2
> > >>> 371     8       0       2       3       0       2       0       0       0
> > >>>    0         |  391         j     = rec.sport.baseball
> > >>> 0       0       1       0       0       0       1       0       0
> > >>> 2       409     0       0       0       0       0       0       0       0
> > >>>    1         |  414         k     = rec.sport.hockey
> > >>> 0       0       1       2       1       0       1       0       0
> > >>> 0       0       404     0       0       0       0       0       1       0
> > >>>    1         |  411         l     = sci.crypt
> > >>> 0       5       4       11      1       3       7       9       2
> > >>> 5       3       3       339     2       6       0       1       1       2
> > >>>    1         |  405         m     = sci.electronics
> > >>> 0       4       0       1       0       0       0       1       0
> > >>> 1       1       0       3       367     3       1       2       0       0
> > >>>    0         |  384         n     = sci.med
> > >>> 0       1       2       0       1       0       2       0       0
> > >>> 1       0       0       1       1       375     0       1       0       0
> > >>>    0         |  385         o     = sci.space
> > >>> 4       2       1       1       0       0       1       1       2
> > >>> 0       0       1       1       5       1       367     4       0       1
> > >>>    1         |  393         p     = soc.religion.christian
> > >>> 0       1       0       0       0       0       0       0       0
> > >>> 2       0       0       0       0       0       2       378     0       1
> > >>>    0         |  384         q     = talk.politics.mideast
> > >>> 0       0       0       0       0       2       1       1       1
> > >>> 1       0       3       0       3       0       0       2       319     2
> > >>>    4         |  339         r     = talk.politics.guns
> > >>> 32      0       0       1       0       0       0       0       0
> > >>> 1       1       1       0       2       2       26      5       7       175
> > >>>    6         |  259         s     = talk.religion.misc
> > >>> 0       0       0       2       0       0       0       0       0
> > >>> 1       2       2       0       1       2       1       10      18      2
> > >>>    278       |  319         t     = talk.politics.misc
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa                                       0.8594
> > >>> Accuracy                                   89.3071%
> > >>> Reliability                                 84.611%
> > >>> Reliability (standard deviation)            0.2148
> > >>>
> > >>> Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 10840 ms (Minutes: 0.18066666666666667)
> > >>>
> > >>>
> > >>> *Naive bayes:*
> > >>> INFO: Standard NB Results:
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances          :      11286       99.0869%
> > >>> Incorrectly Classified Instances        :        104        0.9131%
> > >>> Total Classified Instances              :      11390
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a       b       c       d       e       f       g       h       i
> > >>> j       k       l       m       n       o       p       q       r       s
> > >>>    t        <--Classified as
> > >>> 474     0       0       0       0       0       0       0       0
> > >>> 0       0       0       0       0       0       0       0       0       2
> > >>>    1         |  477         a     = alt.atheism
> > >>> 0       566     0       2       0       1       0       0       0
> > >>> 0       0       0       0       0       0       0       0       0       0
> > >>>    0         |  569         b     = comp.graphics
> > >>> 0       10      590     29      2       4       1       0       0
> > >>> 0       0       0       1       0       0       0       0       0       0
> > >>>    1         |  638         c     = comp.os.ms-windows.misc
> > >>> 0       0       0       596     0       0       0       0       0
> > >>> 0       0       0       0       0       0       0       0       0       0
> > >>>    0         |  596         d     = comp.sys.ibm.pc.hardware
> > >>> 0       0       0       0       575     0       1       0       0
> > >>> 0       0       0       1       0       0       0       0       0       0
> > >>>    0         |  577         e     = comp.sys.mac.hardware
> > >>> 0       2       2       2       0       593     1       0       0
> > >>> 0       0       0       0       0       1       0       0       0       0
> > >>>    0         |  601         f     = comp.windows.x
> > >>> 0       0       0       1       0       0       589     1       0
> > >>> 0       1       0       2       0       0       0       0       0       0
> > >>>    0         |  594         g     = misc.forsale
> > >>> 0       0       0       0       0       0       0       594     0
> > >>> 0       0       0       0       0       0       0       0       0       0
> > >>>    0         |  594         h     = rec.autos
> > >>> 0       0       0       0       0       0       0       0       611
> > >>> 0       0       0       0       0       0       0       0       0       0
> > >>>    0         |  611         i     = rec.motorcycles
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 616     1       0       0       0       0       0       0       0       0
> > >>>    0         |  617         j     = rec.sport.baseball
> > >>> 0       0       0       0       0       0       1       0       0
> > >>> 0       620     0       0       0       0       0       0       0       0
> > >>>    0         |  621         k     = rec.sport.hockey
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       580     0       0       0       0       0       1       0
> > >>>    0         |  581         l     = sci.crypt
> > >>> 0       0       0       3       1       0       0       0       0
> > >>> 0       0       0       571     0       0       0       0       0       0
> > >>>    0         |  575         m     = sci.electronics
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       0       2       583     0       0       0       0       0
> > >>>    0         |  585         n     = sci.med
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       0       0       1       599     0       0       0       0
> > >>>    0         |  600         o     = sci.space
> > >>> 0       1       0       0       0       0       0       0       0
> > >>> 0       0       0       0       0       0       615     0       0       0
> > >>>    0         |  616         p     = soc.religion.christian
> > >>> 1       0       0       0       0       0       0       0       0
> > >>> 0       0       0       0       0       0       1       560     0       0
> > >>>    0         |  562         q     = talk.politics.mideast
> > >>> 0       0       1       0       0       0       0       0       0
> > >>> 0       0       1       0       0       0       0       0       548     0
> > >>>    1         |  551         r     = talk.politics.guns
> > >>> 10      0       0       0       0       0       0       0       0
> > >>> 0       0       0       0       0       1       1       0       2       344
> > >>>    1         |  359         s     = talk.religion.misc
> > >>> 0       0       0       0       0       0       0       0       0
> > >>> 0       0       1       1       0       0       0       0       2       0
> > >>>    462       |  466         t     = talk.politics.misc
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa                                       0.9847
> > >>>  Accuracy                                   99.0869%
> > >>> Reliability                                94.3334%
> > >>> Reliability (standard deviation)            0.2169
> > >>>
> > >>> Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 14304 ms (Minutes: 0.2384)
> > >>> + echo 'Testing on holdout set'
> > >>> Testing on holdout set
> > >>>
> > >>> [snip]
> > >>>
> > >>> INFO: Standard NB Results:
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances          :       6718       90.1019%
> > >>> Incorrectly Classified Instances        :        738        9.8981%
> > >>> Total Classified Instances              :       7456
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a       b       c       d       e       f       g       h       i
> > >>> j       k       l       m       n       o       p       q       r       s
> > >>>    t        <--Classified as
> > >>> 294     0       0       0       0       0       0       0       0
> > >>> 0       0       2       0       1       1       6       1       1       16
> > >>>     0         |  322         a     = alt.atheism
> > >>> 0       345     6       14      6       11      6       0       0
> > >>> 0       0       5       7       1       3       0       0       0       0
> > >>>    0         |  404         b     = comp.graphics
> > >>> 2       29      177     78      22      19      9       1       0
> > >>> 0       0       4       2       0       1       1       0       0       1
> > >>>    1         |  347         c     = comp.os.ms-windows.misc
> > >>> 1       9       2       335     18      2       10      0       0
> > >>> 0       1       0       8       0       0       0       0       0       0
> > >>>    0         |  386         d     = comp.sys.ibm.pc.hardware
> > >>> 1       4       2       13      347     3       5       1       0
> > >>> 0       1       0       7       1       0       0       0       1       0
> > >>>    0         |  386         e     = comp.sys.mac.hardware
> > >>> 0       20      0       4       0       352     4       0       0
> > >>> 0       0       0       1       1       3       0       1       0       1
> > >>>    0         |  387         f     = comp.windows.x
> > >>> 0       2       0       21      5       1       323     7       2
> > >>> 2       0       2       12      0       3       0       0       0       0
> > >>>    1         |  381         g     = misc.forsale
> > >>> 0       1       0       0       1       0       15      363     8
> > >>> 1       0       0       4       1       0       0       0       1       0
> > >>>    1         |  396         h     = rec.autos
> > >>> 0       1       0       0       0       0       6       6       370
> > >>> 0       0       0       0       1       0       0       0       0       1
> > >>>    0         |  385         i     = rec.motorcycles
> > >>> 1       0       0       1       1       0       2       1       2
> > >>> 362     5       0       2       0       0       0       0       0       0
> > >>>    0         |  377         j     = rec.sport.baseball
> > >>> 0       0       0       1       2       0       0       0       0
> > >>> 3       371     0       0       0       0       0       0       0       0
> > >>>    1         |  378         k     = rec.sport.hockey
> > >>> 0       3       1       0       1       0       2       0       0
> > >>> 0       0       396     0       1       0       0       1       1       1
> > >>>    3         |  410         l     = sci.crypt
> > >>> 0       7       0       7       7       2       6       4       0
> > >>> 0       0       1       369     2       2       0       0       0       0
> > >>>    2         |  409         m     = sci.electronics
> > >>> 0       3       0       2       1       0       2       0       0
> > >>> 0       0       1       4       383     4       0       0       1       0
> > >>>    4         |  405         n     = sci.med
> > >>> 0       5       0       0       1       0       3       0       0
> > >>> 0       0       0       1       0       374     1       0       0       1
> > >>>    1         |  387         o     = sci.space
> > >>> 6       2       0       1       1       0       0       1       0
> > >>> 1       0       0       1       5       0       352     2       1       7
> > >>>    1         |  381         p     = soc.religion.christian
> > >>> 1       1       0       0       0       0       0       0       0
> > >>> 0       1       0       0       0       0       0       373     1       0
> > >>>    1         |  378         q     = talk.politics.mideast
> > >>> 0       0       0       0       0       0       1       0       1
> > >>> 0       0       2       0       0       0       0       0       346     2
> > >>>    7         |  359         r     = talk.politics.guns
> > >>> 26      1       0       1       0       0       0       2       0
> > >>> 1       1       0       0       1       1       20      2       6       200
> > >>>    7         |  269         s     = talk.religion.misc
> > >>> 1       0       0       0       0       0       0       2       0
> > >>> 0       1       0       0       2       2       0       1       14      0
> > >>>    286       |  309         t     = talk.politics.misc
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa                                       0.8726
> > >>> Accuracy                                   90.1019%
> > >>> Reliability                                85.4491%
> > >>> Reliability (standard deviation)            0.2222
> > >>>
> > >>> Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 10878 ms (Minutes: 0.1813)
> > >>>
> > >>> *SGD:*
> > >>> 7532 test files
> > >>>
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances          :       5649            75%
> > >>> Incorrectly Classified Instances        :       1883            25%
> > >>> Total Classified Instances              :       7532
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a       b       c       d       e       f       g       h       i
> > >>> j       k       l       m       n       o       p       q       r       s
> > >>>    t        <--Classified as
> > >>> 186     6       3       10      5       0       33      4       13
> > >>>  15      7       1       24      15      3       15      5       5       29
> > >>>     15        |  394         a     = sci.space
> > >>> 5       309     0       3       2       5       0       0       0
> > >>> 1       9       21      2       0       0       18      4       4       1
> > >>>    1         |  385         b     = comp.sys.mac.hardware
> > >>> 4       1       101     3       0       1       63      0       7
> > >>> 0       1       1       5       16      3       0       3       7       1
> > >>>    34        |  251         c     = talk.religion.misc
> > >>> 11      12      1       265     1       10      3       0       0
> > >>> 17      10      11      5       2       0       11      3       6       21
> > >>>     0         |  389         d     = comp.graphics
> > >>> 2       1       1       0       349     2       3       0       3
> > >>> 2       6       1       5       1       0       2       15      2       1
> > >>>    2         |  398         e     = rec.motorcycles
> > >>> 7       20      3       19      2       254     6       0       2
> > >>> 11      2       39      7       2       0       4       2       2       9
> > >>>    3         |  394         f     = comp.os.ms-windows.misc
> > >>> 2       1       13      0       0       0       247     0       1
> > >>> 1       3       0       6       2       4       0       2       3       5
> > >>>    29        |  319         g     = alt.atheism
> > >>> 1       1       0       0       2       0       2       361     0
> > >>> 1       2       0       2       0       0       1       3       22      0
> > >>>    1         |  399         h     = rec.sport.hockey
> > >>> 3       0       3       1       0       0       5       0       161
> > >>> 0       1       2       12      102     0       0       1       2       11
> > >>>     6         |  310         i     = talk.politics.misc
> > >>> 2       8       0       19      0       19      0       0       1
> > >>> 294     10      11      4       2       0       5       0       3       11
> > >>>     6         |  395         j     = comp.windows.x
> > >>> 2       10      0       1       1       0       0       0       0
> > >>> 1       347     13      2       1       0       5       3       2       2
> > >>>    0         |  390         k     = misc.forsale
> > >>> 1       36      0       6       1       25      0       0       1
> > >>> 6       10      257     2       1       0       34      6       0       6
> > >>>    0         |  392         l     = comp.sys.ibm.pc.hardware
> > >>> 2       2       2       2       1       0       12      0       0
> > >>> 6       10      4       312     5       2       13      11      3       3
> > >>>    6         |  396         m     = sci.med
> > >>> 2       0       3       2       1       0       0       1       13
> > >>>  0       5       1       2       314     2       0       2       2       10
> > >>>     4         |  364         n     = talk.politics.guns
> > >>> 1       0       2       1       1       0       34      1       33
> > >>>  1       3       0       1       8       271     1       4       5       6
> > >>>      3         |  376         o     = talk.politics.mideast
> > >>> 3       14      0       8       2       8       3       1       1
> > >>> 7       12      29      6       2       1       245     13      2       32
> > >>>     4         |  393         p     = sci.electronics
> > >>> 3       3       0       2       11      0       1       0       2
> > >>> 1       11      6       4       2       0       11      330     4       4
> > >>>    1         |  396         q     = rec.autos
> > >>> 0       0       1       0       1       0       4       12      3
> > >>> 1       3       0       0       0       0       5       6       359     1
> > >>>    1         |  397         r     = rec.sport.baseball
> > >>> 0       1       0       0       0       1       0       0       3
> > >>> 3       0       0       3       2       1       6       1       6       366
> > >>>    3         |  396         s     = sci.crypt
> > >>> 0       2       11      1       1       0       40      0       1
> > >>> 2       3       4       2       1       0       5       0       2       2
> > >>>    321       |  398         t     = soc.religion.christian
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa                                       0.7073
> > >>> Accuracy                                        75%
> > >>> Reliability                                70.6238%
> > >>> Reliability (standard deviation)            0.2187
> > >>> Log-likelihood                mean      :    -1.1182
> > >>>                               25%-ile   :    -1.6911
> > >>>                               75%-ile   :    -0.0803
> > >>>
> > >>> Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 10783 ms (Minutes: 0.17971666666666666)
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <su...@yahoo.com>wrote:
> > >>>
> > >>>> Thanks Andrew for reporting that. I rolled back the release to fix this
> > >>>> and few other issues.
> > >>>>
> > >>>> We have removed asf-examples*.sh from trunk as the sample file at the
> > >>>> url mentioned in ur email is not available.
> > >>>> This is something we need to fix and restore in 1.0.
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <
> > >>>> ap.dev@outlook.com> wrote:
> > >>>>
> > >>>> from the asf-email-examples.sh script:
> > >>>>
> > >>>> # You will need to download or otherwise obtain some or all of the
> > >>>> Amazon ASF Em
> > >>>> ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566)
> > >>>> to use this
> > >>>> script.
> > >>>> # To obtain a full copy you will need to launch an EC2 instance and
> > >>>> mount the da
> > >>>> taset to download it, otherwise you can get a sample of it at
> > >>>> #
> > >>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
> > >>>>
> > >>>> It looks like the:
> > >>>>
> > >>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
> > >>>>
> > >>>> link is down.
> > >>>>
> > >>>> Is there somewhere else that we can get a subset of the ASF emails?
> > >>>>
> > >>>>
> > >>>>
> > >>>> Date: Tue, 21 Jan 2014 09:48:06 -0800
> > >>>> > Subject: Re: MAHOUT 0.9 Release - New URL
> > >>>> > From: andrew.musselman@gmail.com
> > >>>> > To: dev@mahout.apache.org
> > >>>> >
> > >>>> > Sure thing; continuing to smoke test the other examples tonight
> > >>>> >
> > >>>> >
> > >>>> > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <
> > >>>> suneel_marthi@yahoo.com>wrote:
> > >>>> >
> > >>>> > > Thanks Andrew M., see that some of the example scripts need to be
> > >>>> fixed as
> > >>>> > > they still refer to the deprecated algorithms.
> > >>>> > > See that the Streaming KMeans has failed for you as well.
> > >>>> > >
> > >>>> > > I'll be rolling back the release today to fix these issues.
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
> > >>>> > > andrew.musselman@gmail.com> wrote:
> > >>>> > >
> > >>>> > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default
> > >>>> 64-bit
> > >>>> > > Linux AMI from tarball.
> > >>>> > >
> > >>>> > > All tests pass.
> > >>>> > >
> > >>>> > > *Output of examples:*
> > >>>> > > *asf-email-examples.sh, run on mahout.apache.org
> > >>>> > > <http://mahout.apache.org>:*
> > >>>> > > *recommendations:*
> > >>>> > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
> > >>>> > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000  | less
> > >>>> > > 1
> > >>>> > >
> > >>>> > >
> > >>>> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
> > >>>> > > 4
> > >>>> > >
> > >>>> > >
> > >>>> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
> > >>>> > > 6
> > >>>> > >
> > >>>> > >
> > >>>> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
> > >>>> > > 8
> > >>>> > >     [12758:1.0,19409:1.0,11112:1.0]
> > >>>> > > 11
> > >>>> > >
> > >>>> > >
> > >>>> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
> > >>>> > > 14
> > >>>> > >
> > >>>> > >
> > >>>> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
> > >>>> > > 15
> > >>>> > >
> > >>>> > >
> > >>>> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
> > >>>> > > 16
> > >>>> > >
> > >>>> > >
> > >>>> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
> > >>>> > > 18
> > >>>> > >
> > >>>> > >
> > >>>> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
> > >>>> > > 19      [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
> > >>>> > > 20
> > >>>> > >
> > >>>> > >
> > >>>> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
> > >>>> > > [snip]
> > >>>> > >
> > >>>> > > *clustering; kmeans:*
> > >>>> > > [snip]
> > >>>> > >         Weight : [props - optional]:  Point:
> > >>>> > >         1.0 :
> > >>>> > >  [distance-squared=1.0193102046188427]:
> > >>>> > > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus=
> > >>>> > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
> > >>>> 7573:0.204,
> > >>>> > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093,
> > >>>> 9779:0.159,
> > >>>> > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
> > >>>> > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
> > >>>> > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
> > >>>> > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
> > >>>> > > 39789:0.110, 40743:0.190, 45775:0.086]
> > >>>> > >         1.0 : [distance-squared=0.9823018320457279]:
> > >>>> > > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus=
> > >>>> > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
> > >>>> 5336:0.106,
> > >>>> > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173,
> > >>>> 7832:0.072,
> > >>>> > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
> > >>>> > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
> > >>>> > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
> > >>>> > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
> > >>>> > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
> > >>>> > >         1.0 : [distance-squared=0.9509142993214911]:
> > >>>> > >
> > >>>> /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
> > >>>> > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
> > >>>> > >  4419:0.076,
> > >>>> > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056,
> > >>>> 7235:0.048,
> > >>>> > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123,
> > >>>> 7683:0.077,
> > >>>> > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
> > >>>> 10225:0.081,
> > >>>> > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
> > >>>> > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
> > >>>> > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
> > >>>> > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
> > >>>> > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
> > >>>> > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
> > >>>> > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
> > >>>> > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
> > >>>> > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
> > >>>> > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
> > >>>> > > 41280:0.065, 41696:0.072, 41947:0.118,
> > >>>> > >  43685:0.086, 44077:0.308,
> > >>>> > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
> > >>>> > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
> > >>>> > > [snip]
> > >>>> > >
> > >>>> > > *clustering; dirichlet:*
> > >>>> > > Get this complaint:
> > >>>> > > Running Dirichlet with K = 8
> > >>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > >>>> > > HADOOP_CONF_DIR=
> > >>>> > > MAHOUT-JOB:
> > >>>> > >
> > >>>> > >
> > >>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > >>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class:
> > >>>> dirichlet
> > >>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props
> > >>>> found on
> > >>>> > > classpath, will use command-line arguments only
> > >>>> > > Unknown program 'dirichlet' chosen.
> > >>>> > >
> > >>>> > > *clustering: minhash:*
> > >>>> > > Running Minhash
> > >>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > >>>> > > HADOOP_CONF_DIR=
> > >>>> > > MAHOUT-JOB:
> > >>>> > >
> > >>>> > >
> > >>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > >>>> > > 14/01/21 05:17:27 WARN
> > >>>> > >  driver.MahoutDriver: Unable to add class: minhash
> > >>>> > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found
> > >>>> on
> > >>>> > > classpath, will use command-line arguments only
> > >>>> > > Unknown program 'minhash' chosen.
> > >>>> > >
> > >>>> > > *classification; standard:*
> > >>>> > > =======================================================
> > >>>> > > Summary
> > >>>> > > -------------------------------------------------------
> > >>>> > > Correctly Classified Instances          :       5384       87.7874%
> > >>>> > > Incorrectly Classified Instances        :        749       12.2126%
> > >>>> > > Total Classified Instances              :       6133
> > >>>> > >
> > >>>> > > =======================================================
> > >>>> > > Confusion Matrix
> > >>>> > > -------------------------------------------------------
> > >>>> > > a       b       c       d
> > >>>> > >     <--Classified as
> > >>>> > > 2949    7       531     25       |  3512        a     = dev
> > >>>> > > 0       0       0       0        |  0           b     = general
> > >>>> > > 99      8       1763    8        |  1878        c     = user
> > >>>> > > 41      1       29      672      |  743         d     = commits
> > >>>> > >
> > >>>> > > =======================================================
> > >>>> > > Statistics
> > >>>> > > -------------------------------------------------------
> > >>>> > > Kappa
> > >>>> > >  0.7877
> > >>>> > > Accuracy                                   87.7874%
> > >>>> > > Reliability                                 53.658%
> > >>>> > > Reliability (standard deviation)            0.4911
> > >>>> > >
> > >>>> > > *classification; complementary:*
> > >>>> > > =======================================================
> > >>>> > > Summary
> > >>>> > > -------------------------------------------------------
> > >>>> > > Correctly Classified Instances          :       5530       90.1679%
> > >>>> > > Incorrectly Classified Instances        :        603        9.8321%
> > >>>> > > Total Classified Instances              :
> > >>>> > >  6133
> > >>>> > >
> > >>>> > > =======================================================
> > >>>> > > Confusion Matrix
> > >>>> > > -------------------------------------------------------
> > >>>> > > a       b       c       d       <--Classified as
> > >>>> > > 3168    0       276     68       |  3512        a     = dev
> > >>>> > > 0       0       0       0        |  0           b     = general
> > >>>> > > 196     0       1652    30       |  1878        c     = user
> > >>>> > > 25      0       8       710      |  743         d     =
> > >>>> > >  commits
> > >>>> > >
> > >>>> > > =======================================================
> > >>>> > > Statistics
> > >>>> > > -------------------------------------------------------
> > >>>> > > Kappa                                       0.8259
> > >>>> > > Accuracy                                   90.1679%
> > >>>> > > Reliability                                54.7459%
> > >>>> > > Reliability (standard deviation)            0.5005
> > >>>> > >
> > >>>> > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms
> > >>>> (Minutes:
> > >>>> > > 0.34836666666666666)
> > >>>> > >
> > >>>> > > *classification; sgd, with three categories:*
> > >>>> > > Running SGD Training
> > >>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
> > >>>> > >  and
> > >>>> > > HADOOP_CONF_DIR=
> > >>>> > > MAHOUT-JOB:
> > >>>> > >
> > >>>> > >
> > >>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > >>>> > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
> > >>>> > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
> > >>>> classpath,
> > >>>> > > will use command-line arguments only
> > >>>> > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
> > >>>> > > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
> > >>>> > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
> > >>>> > > --output=[asf-output/classification/sgd/models], --poolSize=[5],
> > >>>> > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
> > >>>> > > 24168 training files
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       1
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
> > >>>> > >  2
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       3
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       4
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       6
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       8
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       10
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00
> > >>>> > >    0.00    0.00    0.0000000       0.0000000       12
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       15
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       20
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       25
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       30
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000
> > >>>> > >     0.0000000       40
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       50
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       60
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       70
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       80
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       100
> > >>>> > > 0.000
> > >>>> > >  0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       120
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       140
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       150
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       200
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       250
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00
> > >>>> > >  0.00    0.00    0.0000000       0.0000000       300
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       400
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       500
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       600
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       700
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.00    0.00    0.00    0.00    0.0000000
> > >>>> > >  0.0000000       800
> > >>>> > > 0.000   0.00    none
> > >>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > >>>> > > 1.0019413e-08   1000    -0.607  75.78   none
> > >>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > >>>> > > 1.0019413e-08   1200    -0.607  75.78   none
> > >>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > >>>> > > 1.0019413e-08   1400    -0.607  75.78   none
> > >>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > >>>> > > 1.0019413e-08   1500    -0.607  75.78   none
> > >>>> > > 0.24    43686.00        17924.00        329.50
> > >>>> > >  1.0571799e-08
> > >>>> > > 1.0032261e-08   2000    -0.487  82.65   none
> > >>>> > > 0.24    49753.00        21610.00        330.71  1.3770070e-08
> > >>>> > > 1.0011902e-08   2500    -0.439  83.90   none
> > >>>> > > 0.24    49753.00        21610.00        330.71  1.3770070e-08
> > >>>> > > 1.0011902e-08   3000    -0.439  83.90   none
> > >>>> > > 0.32    50635.00        28531.00        437.09  1.0551175e-08
> > >>>> > > 1.0000001e-08   4000    -0.351  88.14   none
> > >>>> > > 0.32    50635.00        32642.00        437.09  1.0551175e-08
> > >>>> > > 1.0000000e-08   5000    -0.378  87.10   none
> > >>>> > > 0.32    50635.00        36461.00        437.09
> > >>>> > >  1.0556652e-08
> > >>>> > > 1.0000001e-08   6000    -0.372  86.89   none
> > >>>> > > 0.32    50635.00        37768.00        437.09  1.0576742e-08
> > >>>> > > 1.0000001e-08   7000    -0.334  89.26   none
> > >>>> > > 0.32    50635.00        38807.00        437.09  1.0576742e-08
> > >>>> > > 1.0000000e-08   8000    -0.368  87.52   none
> > >>>> > > 0.32    50635.00        44731.00        437.09  1.0576716e-08
> > >>>> > > 1.0000000e-08   10000   -0.374  87.39   none
> > >>>> > > 0.32    50635.00        45672.00        437.09  1.0576716e-08
> > >>>> > > 1.0000000e-08   12000   -0.298  88.26   none
> > >>>> > > Exception in thread "main" java.lang.IllegalStateException:
> > >>>> > > java.lang.ArrayIndexOutOfBoundsException:
> > >>>> > >  2
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
> > >>>> > >         at
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
> > >>>> > >         at
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
> > >>>> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > >>>> Method)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > >>>> > >
> > >>>> > >  at
> > >>>> > >
> > >>>> > >
> > >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > >>>> > >         at java.lang.reflect.Method.invoke(Method.java:622)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > >>>> > >         at
> > >>>> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > >>>> > >         at
> > >>>> > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> > >>>> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > >>>> Method)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > >>>> > >         at java.lang.reflect.Method.invoke(Method.java:622)
> > >>>> > >         at
> > >>>> > >  org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> > >>>> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
> > >>>> > >         at
> > >>>> > > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
> > >>>> > >
> > >>>> > >  at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
> > >>>> > >         at
> > >>>> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > >>>> > >         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > >>>> > >         at
> > >>>> > >
> > >>>> > >
> > >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >>>> > >         at java.lang.Thread.run(Thread.java:701)
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
> > >>>> > > andrew.musselman@gmail.com> wrote:
> > >>>> > >
> > >>>> > > > Trying out the build today
> > >>>> > > >
> > >>>> > > >
> > >>>> > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
> > >>>> suneel_marthi@yahoo.com
> > >>>> > > >wrote:
> > >>>> > > >
> > >>>> > > >> This is an issue (trivial one though) that needs to be fixed for
> > >>>> 0.9
> > >>>> > > >> Release, will be rerolling the release today (in the next few
> > >>>> hrs) and
> > >>>> > > >> putting out a new release candidate in staging.
> > >>>> > > >>
> > >>>> > > >> Thanks for reporting this Andrew P.
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
> > >>>> > > ap.dev@outlook.com>
> > >>>> > > >> wrote:
> > >>>> > > >>
> > >>>> > > >> I ran through the tests with on a CentOS VM
> > >>>> > >  AMD64 2 cores 4 GB RAM.  Had
> > >>>> > > >> a bit of trouble getting the Hadoop natives to compile and
> > >>>> therefore may
> > >>>> > > >> have run into some problems because of the hadoop setup.  Ran
> > >>>> into some
> > >>>> > > >> problems in the example scripts.  Particularly with
> > >>>> > > >> ./cluster-syntheticcontrol.sh ->4,5.  I will run through the
> > >>>> rest of the
> > >>>> > > >> examples when im sure I've got hadoop setup right.
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> Apache Maven 3.1.2-SNAPSHOT
> > >>>> > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> > >>>> > > >> Java home: /usr/java/jdk1.6.0_45/jre
> > >>>> > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch:
> > >>>> "amd64",
> > >>>> > > >> family: "unix"
> > >>>> > > >> $MAHOUT_LOCAL=true
> > >>>> > > >> Hadoop 2.2.0
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> a) Verify that u can unpack the release (tar or zip) ...passed
> > >>>> (tar)
> > >>>> > > >> [passed ]
> > >>>> > > >>
> > >>>> > > >> b) Verify u r able to compile the
> > >>>> > >  distro
> > >>>> > > >>
> > >>>> > > >>     mvn compile- [passed with warnings]
> > >>>> > > >>
> > >>>> > > >>     [WARNING]  Expected all dependencies to require Scala
> > >>>> version: 2.9.3
> > >>>> > > >>     [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires
> > >>>> scala
> > >>>> > > >> version: 2.9.3
> > >>>> > > >>     [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala
> > >>>> > > >> version: 2.9.2
> > >>>> > > >>     [WARNING] Multiple versions of scala libraries detected!
> > >>>> > > >>
> > >>>> > > >> c)  Run through the unit tests: mvn clean test
> > >>>> > > >>     mvn clean test [passed]
> > >>>> > > >>
> > >>>> > > >> d) Run the
> > >>>> > > >>  example scripts under $MAHOUT_HOME/examples/bin.
> > >>>> > > >> Please run through all the different options in each script
> > >>>> > > >>
> > >>>> > > >>     Running example scripts with $MAHOUT_LOCAL=true
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > >  ./cluster-syntheticcontrol.sh ->1 [works]
> > >>>> > > >>     ./cluster-syntheticcontrol.sh ->2 [works]
> > >>>> > > >>     ./cluster-syntheticcontrol.sh ->3 [works]
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>     ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
> > >>>> > > >>     [...]
> > >>>> > > >>     WARNING: Unable to add class:
> > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > >>>> > > >>     java.lang.ClassNotFoundException:
> > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > >>>> > > >>         at
> > >>>> > > >>  java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > >>>> > > >>         at java.security.AccessController.doPrivileged(Native
> > >>>> Method)
> > >>>> > > >>         at
> > >>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > >>>> > > >>         at
> > >>>> > >  java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > >>>> > > >>         at
> > >>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > >>>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > >>>> > > >>         at java.lang.Class.forName0(Native Method)
> > >>>> > > >>         at java.lang.Class.forName(Class.java:171)
> > >>>> > > >>         at
> > >>>> > > >>
> > >>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > >>>> > > >>         at
> > >>>> > > >>
> > >>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > >>>> > > >>     Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>     ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
> > >>>> > > >>
> > >>>> > > >>     WARNING: Unable to add class:
> > >>>> > > >>
> > >>>> > >  org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > >>>> > > >>     java.lang.ClassNotFoundException:
> > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > >>>> > > >>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > >>>> > > >>         at java.security.AccessController.doPrivileged(Native
> > >>>> Method)
> > >>>> > > >>         at
> > >>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > >>>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > >>>> > > >>         at
> > >>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > >>>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > >>>> > > >>         at java.lang.Class.forName0(Native Method)
> > >>>> > > >>         at
> > >>>> > >  java.lang.Class.forName(Class.java:171)
> > >>>> > > >>         at
> > >>>> > > >>
> > >>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > >>>> > > >>         at
> > >>>> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > >>>> > > >>     Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
> > >>>> > > >>     WARNING: No
> > >>>> > > >>
> > >>>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found
> > >>>> > > on
> > >>>> > > >> classpath, will use command-line arguments only
> > >>>> > > >>     Unknown program
> > >>>> > > >>  'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
> > >>>> chosen.
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>     ./classify-20newsgroups.sh ->1 [works]
> > >>>> > > >>     ./classify-20newsgroups.sh ->2 [works]
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>     cluster-reuters.sh ->1 [works]
> > >>>> > > >>
> > >>>> > >  cluster-reuters.sh ->2 [works]
> > >>>> > > >>     cluster-reuters.sh ->3 [works]
> > >>>> > > >>
> > >>>> > > >>     Same error as noted previosly in the thread:
> > >>>> > > >>
> > >>>> > > >>     cluster-reuters.sh ->4 [0 clusters]
> > >>>> > > >>
> > >>>> > > >>     [...]
> > >>>> > > >>
> > >>>> > > >>     WARNING: No qualcluster.props found on classpath, will use
> > >>>> > > >> command-line arguments only
> > >>>> > > >>     Num clusters: 0; maxDistance: 0.000000
> > >>>> > > >>     [Dunn Index]
> > >>>> > > >>  First: Infinity
> > >>>> > > >>     [Davies-Bouldin Index] First: NaN
> > >>>> > > >>     Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>>> > > >>     INFO: Program took 669 ms (Minutes: 0.01115)
> > >>>> > > >>     cluster,distance.mean,distance.sd
> > >>>> > > >>
> > >>>> > >
> > >>>> > >
> > >>>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
> > >>>> > > >> > From: suneel_marthi@yahoo.com
> > >>>> > > >> > Subject: MAHOUT 0.9 Release - New URL
> > >>>> > > >> > To: user@mahout.apache.org; dev@mahout.apache.org
> > >>>> > > >> >
> > >>>> > > >> > Third time's a Charm!!!
> > >>>> > > >> >
> > >>>> > > >> >
> > >>>> > > >> > Here's the new URL for Mahout 0.9 Release:
> > >>>> > > >> >
> > >>>> > > >>
> > >>>> > >
> > >>>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> > >>>> > > >> >
> > >>>> > > >> > For those volunteering to test this, some of the things to be
> > >>>> > > verified:
> > >>>> > > >> >
> > >>>> > > >> > a) Verify that u can unpack the release (tar or zip)
> > >>>> > > >> > b) Verify u r able to compile the distro
> > >>>> > > >> > c)  Run through the unit tests: mvn clean test
> > >>>> > > >> > d) Run the example scripts
> > >>>> > > >>  under $MAHOUT_HOME/examples/bin. Please run through all the
> > >>>> different
> > >>>> > > >> options in each script.
> > >>>> > > >> >
> > >>>> > > >> >
> > >>>> > > >> > Committers
> > >>>> > > >> >  and PMC members:
> > >>>> > > >> > ---------------------------------------
> > >>>> > > >> >
> > >>>> > > >> > Need 'at least 3 +1 votes' for the Release to pass.
> > >>>> > > >> >
> > >>>> > > >> >
> > >>>> > > >> > Thanks and
> > >>>> > >  Regards.
> > >>>> > > >>
> > >>>> > > >
> > >>>> > > >
> > >>>> > >
> > >>>>
> > >>>
> > >>>
> > >>
> > >
>  		 	   		  
 		 	   		  

RE: MAHOUT 0.9 Release - New URL

Posted by Andrew Palumbo <ap...@outlook.com>.
will do!

> Date: Wed, 22 Jan 2014 01:24:05 -0800
> From: suneel_marthi@yahoo.com
> Subject: Re: MAHOUT 0.9 Release - New URL
> To: dev@mahout.apache.org; user@mahout.apache.org
> 
> Andrew M., Andrew P. and others,
> 
> Sebastian and me fixed a few issues today (for 0.9):
> 
> a) Removed asf-email-examples.sh script and few other scripts that should have been removed. Also removed references/invocations to algorithms that have been removed from the codebase.
> b) Fixed the issue with Streaming Kmeans clustering and checked in the code.  
> c) Resurrected Frequent Pattern Mining implementation for 0.9.
> 
> Please checkout the latest code from trunk, run a build locally and run thru the example scripts. 
> 
> Thanks and Regards.
> 
> 
> 
> 
> 
> 
> On Wednesday, January 22, 2014 12:11 AM, Andrew Musselman <an...@gmail.com> wrote:
>  
> *factorize-movielens-1M.sh:*
> RMSE is:
> 
> 0.8519064098265133
> 
> 
> Sample recommendations:
> 
> 2229
> [2197:4.4961276,527:4.4309845,2972:4.4180074,318:4.379484,572:4.312696,3092:4.249903]
> 5848
> [1900:4.6775646,3787:4.6623707,632:4.641377,2609:4.608225,3808:4.6058936,2998:4.6057487]
> 3728
> [572:4.951382,47:4.748921,874:4.6945343,1563:4.679901,3314:4.6621537,50:4.655838]
> 1252
> [53:5.0,3816:4.9664702,3077:4.9494777,213:4.94007,3808:4.9060082,978:4.8568053]
> 634
> [572:5.0,3092:4.779557,1872:4.72024,2687:4.629712,2125:4.615142,3853:4.5261393]
> 5516    [572:5.0,2197:5.0,3092:5.0,318:4.908213,356:4.885,3844:4.8237453]
> 2276    [1204:5.0,572:5.0,912:5.0,1250:5.0,1272:4.999891,1262:4.989652]
> 4219
> [53:4.8112006,598:4.775032,858:4.761604,572:4.7579737,1219:4.680987,1221:4.6604886]
> 91      [1198:5.0,2762:5.0,1207:5.0,1234:5.0,318:5.0,260:5.0]
> 502
> [953:5.0,260:4.9800477,1234:4.869403,1198:4.8527064,1207:4.8497486,3469:4.847286]
> 
> factorize-netflix.sh:
> References a no-longer-available data set that Netflix took down after the
> competition; should at least mention that the data set is no longer
> "online" at least.
> 
> 
> On Tue, Jan 21, 2014 at 8:05 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
> 
> > *clustering-syntheticcontrol.sh*
> >
> > *Canopy:*
> > [snip]
> >         1.0 : [distance-squared=1740.681000315628]: [35.486, 25.600,
> > 29.914, 30.200, 27.654, 28.647, 26.582, 32.941, 34.728, 29.047, 34.706,
> > 34.764, 28.816, 30.271, 25.784, 25.035, 35.436, 29.075, 24.267, 24.625,
> > 28.162, 26.218, 28.773, 28.966, 28.802, 34.292, 26.211, 33.363, 32.920,
> > 31.891, 34.504, 32.686, 24.327, 35.981, 31.390, 10.832, 20.238, 10.051,
> > 14.877, 10.570, 19.603, 14.544, 10.667, 16.470, 19.007, 10.352, 13.473,
> > 12.196, 10.684, 16.620, 20.434, 17.069, 18.744, 9.599, 11.195, 12.002,
> > 10.017, 17.149, 14.850, 10.890]
> >         1.0 : [distance-squared=1455.363773097357]: [31.022, 28.140,
> > 26.730, 26.570, 29.561, 26.966, 28.049, 25.673, 33.721, 26.275, 30.410,
> > 31.101, 24.019, 35.659, 25.253, 25.932, 28.618, 32.423, 33.666, 33.745,
> > 35.118, 29.164, 25.477, 31.947, 35.491, 30.730, 25.820, 24.651, 25.528,
> > 31.343, 29.005, 31.825, 26.891, 28.194, 31.429, 16.935, 8.070, 16.604,
> > 14.743, 10.342, 8.155, 10.395, 17.689, 16.791, 14.138, 15.761, 6.787,
> > 13.062, 16.660, 15.021, 9.891, 9.216, 11.550, 8.877, 18.220, 9.477, 10.342,
> > 16.430, 11.898, 15.366]
> >         1.0 : [distance-squared=1679.9304895378882]: [29.625, 25.503,
> > 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> > 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> > 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> > 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> > 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> > 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> > 15.285, 22.528, 20.657, 24.129]
> >         1.0 : [distance-squared=2044.2887801683828]: [27.414, 25.397,
> > 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> > 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> > 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> > 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> > 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> > 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> > 20.229, 11.131, 9.980, 10.720]
> >         1.0 : [distance-squared=1385.3154063160764]: [35.899, 26.672,
> > 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> > 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> > 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> > 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> > 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> > 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> > 16.546, 15.927, 18.084, 17.475]
> >         1.0 : [distance-squared=1920.6376615603585]: [24.538, 24.280,
> > 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> > 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> > 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> > 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> > 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> > 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> > 19.310, 12.999, 17.460]
> >         1.0 : [distance-squared=2192.939571172661]: [34.335, 30.938,
> > 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> > 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> > 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> > 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> > 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> > 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> > 11.743, 11.699, 10.152]
> > Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: Wrote 6 clusters
> > Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: Program took 5827 ms (Minutes: 0.09711666666666667)
> >
> > *K-means:*
> > [snip]
> >         1.0 : [distance-squared=2873.881301031739]: [26.369, 37.791,
> > 41.839, 39.694, 36.728, 35.079, 30.668, 24.755, 20.610, 18.885, 15.459,
> > 14.074, 19.117, 34.230, 32.202, 40.715, 39.543, 37.193, 39.448, 30.829,
> > 25.273, 22.324, 19.801, 13.505, 17.462, 24.287, 27.398, 38.577, 42.108,
> > 41.947, 43.987, 41.331, 28.958, 23.664, 20.308, 22.244, 12.149, 15.768,
> > 18.964, 32.579, 33.202, 36.205, 42.364, 40.601, 35.415, 39.576, 33.145,
> > 19.936, 23.062, 19.053, 24.383, 19.611, 25.218, 38.860, 36.570, 38.964,
> > 36.946, 36.900, 32.593, 31.563]
> >         1.0 : [distance-squared=2525.8924241648783]: [35.389, 31.178,
> > 40.041, 43.034, 49.524, 40.942, 42.369, 30.153, 26.253, 23.178, 19.885,
> > 19.693, 21.837, 26.858, 23.533, 29.798, 43.401, 43.349, 48.238, 43.868,
> > 44.017, 28.056, 25.284, 22.737, 13.703, 14.023, 22.115, 23.720, 25.478,
> > 37.870, 37.868, 46.954, 43.135, 41.286, 37.269, 29.347, 24.312, 21.743,
> > 19.152, 21.668, 10.882, 16.147, 30.020, 28.472, 38.716, 44.620, 47.847,
> > 48.846, 39.361, 38.449, 33.105, 19.935, 14.961, 12.149, 12.630, 13.459,
> > 18.967, 35.473, 30.146, 45.527]
> >         1.0 : [distance-squared=2392.7171990886272]: [27.662, 37.199,
> > 39.158, 44.264, 46.473, 40.178, 38.728, 24.412, 25.155, 15.938, 13.125,
> > 16.867, 13.875, 29.969, 34.322, 40.870, 44.225, 47.594, 47.607, 44.751,
> > 39.813, 32.461, 16.646, 16.305, 21.256, 20.627, 19.267, 31.901, 34.995,
> > 43.643, 43.152, 47.125, 48.524, 33.131, 32.697, 20.300, 14.350, 18.330,
> > 12.155, 11.261, 19.701, 29.313, 30.457, 39.914, 49.543, 49.851, 46.445,
> > 39.561, 28.860, 22.486, 13.967, 10.006, 11.715, 15.945, 21.348, 25.439,
> > 41.024, 37.105, 45.623, 43.589]
> >         1.0 : [distance-squared=1419.8378244373016]: [25.784, 34.129,
> > 42.659, 37.176, 35.961, 34.307, 32.108, 29.749, 25.047, 17.455, 24.652,
> > 25.311, 22.995, 30.256, 25.955, 28.426, 34.556, 40.386, 39.642, 40.566,
> > 32.612, 34.091, 26.033, 25.668, 26.545, 17.338, 24.980, 20.134, 27.542,
> > 36.612, 31.855, 37.947, 39.736, 33.535, 36.607, 37.479, 32.612, 22.262,
> > 20.662, 16.124, 24.547, 27.686, 21.747, 27.198, 31.259, 40.569, 37.067,
> > 34.465, 34.730, 33.371, 23.060, 30.162, 22.022, 22.216, 14.812, 19.357,
> > 24.508, 34.432, 32.155, 34.839]
> >         1.0 : [distance-squared=4186.814512311335]: [25.870, 39.195,
> > 36.908, 47.052, 47.384, 40.741, 42.494, 30.282, 25.834, 17.650, 16.004,
> > 17.895, 13.321, 19.045, 27.440, 31.911, 39.208, 43.622, 41.567, 44.815,
> > 44.921, 35.422, 35.477, 23.190, 17.859, 14.684, 23.504, 23.141, 21.746,
> > 30.816, 31.361, 37.015, 38.094, 46.688, 47.681, 43.777, 39.652, 31.701,
> > 23.767, 22.265, 22.654, 22.327, 19.195, 21.163, 29.602, 27.563, 36.244,
> > 38.859, 44.234, 42.352, 42.160, 40.172, 30.094, 21.092, 25.193, 13.096,
> > 18.111, 14.754, 27.386, 27.026]
> >         1.0 : [distance-squared=1544.4011543572997]: [28.075, 41.784,
> > 42.120, 38.735, 44.320, 34.316, 32.212, 31.868, 24.301, 14.547, 17.178,
> > 22.279, 24.357, 31.011, 31.444, 34.837, 46.550, 48.301, 38.859, 42.363,
> > 35.657, 31.499, 19.794, 12.124, 15.371, 12.436, 15.763, 24.679, 32.597,
> > 43.004, 36.616, 38.935, 42.954, 34.957, 36.183, 28.177, 16.326, 14.988,
> > 10.680, 22.728, 24.075, 24.058, 36.616, 43.982, 39.198, 40.118, 40.078,
> > 34.752, 34.018, 23.750, 18.374, 12.251, 15.539, 18.699, 28.973, 24.044,
> > 39.404, 38.034, 46.458, 44.432]
> >         1.0 : [distance-squared=825.9338725427806]: [33.670, 38.675,
> > 39.742, 41.989, 37.291, 43.975, 31.909, 25.878, 31.080, 15.858, 13.950,
> > 23.097, 19.983, 21.692, 31.579, 38.570, 33.376, 38.843, 41.936, 33.534,
> > 39.195, 32.897, 25.343, 18.523, 15.089, 17.771, 22.614, 25.313, 23.687,
> > 29.010, 41.995, 35.712, 40.872, 41.669, 32.156, 25.162, 24.980, 23.705,
> > 18.413, 20.975, 14.906, 26.171, 30.165, 27.818, 35.083, 39.514, 37.851,
> > 33.967, 32.338, 34.977, 26.589, 28.079, 19.597, 24.669, 23.098, 25.685,
> > 28.215, 34.940, 36.910, 39.749]
> > Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: Wrote 6 clusters
> > Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: Program took 16902 ms (Minutes: 0.2817)
> >
> > *Fuzzy k-means:*
> > [snip]
> >         1.0 : [distance-squared=971.7369782121968]: [29.625, 25.503,
> > 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> > 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> > 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> > 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> > 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> > 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> > 15.285, 22.528, 20.657, 24.129]
> >         1.0 : [distance-squared=2054.618163154475]: [27.414, 25.397,
> > 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> > 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> > 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> > 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> > 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> > 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> > 20.229, 11.131, 9.980, 10.720]
> >         1.0 : [distance-squared=954.6503560728597]: [35.899, 26.672,
> > 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> > 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> > 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> > 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> > 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> > 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> > 16.546, 15.927, 18.084, 17.475]
> >         1.0 : [distance-squared=2817.9170498632957]: [24.538, 24.280,
> > 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> > 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> > 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> > 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> > 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> > 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> > 19.310, 12.999, 17.460]
> >         1.0 : [distance-squared=3472.3684696871424]: [34.335, 30.938,
> > 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> > 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> > 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> > 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> > 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> > 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> > 11.743, 11.699, 10.152]
> > Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: Wrote 6 clusters
> > Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: Program took 106615 ms (Minutes: 1.7769166666666667)
> >
> > *Dirichlet and Meanshift:*
> > Already detailed in M-1400, deprecated jobs still referenced.
> >
> >
> >
> > On Tue, Jan 21, 2014 at 6:20 PM, Andrew Musselman <
> > andrew.musselman@gmail.com> wrote:
> >
> >> *cluster-reuters.sh*
> >> *k-means:*
> >>
> >> [snip]
> >> :VL-19482{n=913 c=[0.06:0.011, 0.1:0.007, 0.13:0.010, 0.25:0.016,
> >> 0.38:0.020, 0.4:0.007, 0.5:0.032, 0
> >>         Top Terms:
> >>                 banks                                   =>
> >> 3.841823268955143
> >>                 bank                                    =>
> >>  3.80633066361209
> >>                 debt                                    =>
> >>  3.28065219870794
> >>                 said                                    =>
> >>  2.5965700942088583
> >>                 he                                      =>
> >> 2.335682813857497
> >>                 foreign                                 =>
> >>  2.2217853688201403
> >>                 billion                                 =>
> >>  2.1970193848291335
> >>                 would                                   =>
> >>  1.9932392063955617
> >>                 loans                                   =>
> >>  1.9309276792854233
> >>                 interest                                =>
> >>  1.787324501938
> >>                 have                                    =>
> >> 1.762981951432578
> >>                 its                                     =>
> >>  1.7615109954971866
> >>                 which                                   =>
> >>  1.5822081148036862
> >>                 has                                     =>
> >>  1.5600708189041956
> >>                 dlrs                                    =>
> >>  1.5571038313005996
> >>                 finance                                 =>
> >>  1.5539758811252924
> >>                 new                                     =>
> >>  1.5176015811577555
> >>                 had                                     =>
> >>  1.5138723701401844
> >>                 brazil                                  =>
> >>  1.5083369853593172
> >>                 payments                                =>
> >>  1.4539044255886517
> >>         Weight : [props - optional]:  Point:
> >>
> >> :VL-7320{n=2726 c=[0:0.003, 0.1:0.010, 0.2:0.007, 0.3:0.009, 0.4:0.007,
> >> 0.40:0.003, 0.5:0.009, 0.57:0
> >>         Top Terms:
> >>                 vs                                      =>
> >> 6.126130791333171
> >>                 net                                     =>
> >> 4.012191567277523
> >>                 cts                                     =>
> >> 3.822006848832744
> >>                 shr                                     =>
> >>  3.6786004856764527
> >>                 mln                                     =>
> >>  2.9011643584038698
> >>                 loss                                    =>
> >> 2.788368861463607
> >>                 qtr                                     =>
> >> 2.714140225051522
> >>                 revs                                    =>
> >>  2.4739861236454717
> >>                 profit                                  =>
> >>  1.8146888090247015
> >>                 note                                    =>
> >>  1.7977163272138388
> >>                 dlrs                                    =>
> >>  1.6164390808155846
> >>                 avg                                     =>
> >>  1.3901765773336587
> >>                 shrs                                    =>
> >>  1.3856326531419314
> >>                 mths                                    =>
> >>  1.3168717272038506
> >>                 4th                                     =>
> >>  1.2161158425617289
> >>                 oper                                    =>
> >> 1.182419473776814
> >>                 year                                    =>
> >> 1.178086061733047
> >>                 nine                                    =>
> >>  1.0670554836445316
> >>                 3rd                                     =>
> >> 1.041334410056592
> >>                 inc                                     =>
> >>  1.0019361981554935
> >>         Weight : [props - optional]:  Point:
> >>
> >>
> >> Inter-Cluster Density: 0.45562152681859414
> >> Intra-Cluster Density: 0.6952712632167628
> >> CDbw Inter-Cluster Density: 0.0
> >> CDbw Intra-Cluster Density: 16.486930227598684
> >> CDbw Separation: 194.49005884464628
> >>
> >> *fuzzy k-means:*
> >> :SV-18539{n=1039 c=[0:0.026, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> >> 0.01:0.005, 0.02:0.002, 0.0
> >>         Top Terms:
> >>                 said                                    =>
> >>  1.8665592354713065
> >>                 its                                     =>
> >>  1.1335212213411592
> >>                 pct                                     =>
> >>  1.0862816801353348
> >>                 dlrs                                    =>
> >>  1.0854998884993752
> >>                 mln                                     =>
> >> 1.043163996400643
> >>                 from                                    =>
> >>  0.9684961110525736
> >>                 has                                     =>
> >> 0.912161511978058
> >>                 company                                 =>
> >>  0.8754186972808333
> >>                 mar                                     =>
> >>  0.8675333452422878
> >>                 inc                                     =>
> >>  0.7678617590362815
> >>                 would                                   =>
> >>  0.7610968883652675
> >>                 he                                      =>
> >>  0.7459988770503974
> >>                 which                                   =>
> >>  0.7435613119406804
> >>                 year                                    =>
> >>  0.7302840632748394
> >>                 u.s                                     =>
> >>  0.7281061062439116
> >>                 shares                                  =>
> >>  0.7260764102983083
> >>                 corp                                    =>
> >>  0.7179807367808658
> >>                 new                                     =>
> >>  0.7044203783157115
> >>                 stock                                   =>
> >>  0.6962010978721442
> >>                 have                                    =>
> >>  0.6464265467298506
> >> :SV-9431{n=1034 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> >> 0.01:0.004, 0.02:0.002, 0.02
> >>         Top Terms:
> >>                 said                                    =>
> >> 1.864911184196927
> >>                 dlrs                                    =>
> >> 1.199286689822081
> >>                 mln                                     =>
> >>  1.1802134783562215
> >>                 pct                                     =>
> >>  1.1529704214798124
> >>                 its                                     =>
> >>  1.1184398851519701
> >>                 from                                    =>
> >> 1.016647848050332
> >>                 company                                 =>
> >> 0.894703604722841
> >>                 mar                                     =>
> >> 0.879986159541356
> >>                 has                                     =>
> >>  0.8642799128491316
> >>                 year                                    =>
> >>  0.8271823503717782
> >>                 inc                                     =>
> >>  0.7871293745341424
> >>                 corp                                    =>
> >> 0.737705498468879
> >>                 which                                   =>
> >> 0.722975201852743
> >>                 would                                   =>
> >> 0.708000816484415
> >>                 u.s                                     =>
> >>  0.7073294276173905
> >>                 billion                                 =>
> >>  0.7055723996916351
> >>                 he                                      =>
> >>  0.7042684217823294
> >>                 new                                     =>
> >>  0.6834737905434939
> >>                 shares                                  =>
> >>  0.6753327384172428
> >>                 stock                                   =>
> >>  0.6576225144041699
> >> :SV-4785{n=1044 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> >> 0.01:0.006, 0.02:0.002, 0.02
> >>         Top Terms:
> >>                 said                                    =>
> >>  1.8796076179735086
> >>                 its                                     =>
> >> 1.172025965452378
> >>                 dlrs                                    =>
> >> 1.130422792460914
> >>                 pct                                     =>
> >> 1.082038255241358
> >>                 mln                                     =>
> >>  1.0772146872767114
> >>                 company                                 =>
> >>  0.9662235879639138
> >>                 from                                    =>
> >>  0.9473172871605616
> >>                 has                                     =>
> >>  0.9224712965830099
> >>                 mar                                     =>
> >>  0.8769325856924421
> >>                 inc                                     =>
> >>  0.8360245257169788
> >>                 shares                                  =>
> >>  0.8334595641384324
> >>                 stock                                   =>
> >>  0.7704621839612175
> >>                 corp                                    =>
> >>  0.7682400250301806
> >>                 which                                   =>
> >>  0.7389988207856137
> >>                 would                                   =>
> >>  0.7339708917389389
> >>                 year                                    =>
> >>  0.7088414843731325
> >>                 new                                     =>
> >>  0.7038109468655172
> >>                 he                                      =>
> >>  0.6993994455501005
> >>                 u.s                                     =>
> >>  0.6772649147622415
> >>                 share                                   =>
> >>  0.6241804830055171
> >>
> >> *lda:*
> >>
> >> [snip]
> >> 21539
> >> {0.02:0.7071698119320297,0.01:0.09185661419250732,0.055:0.05559716236030052,0:0.04416624482186779,0.046:0.04110903741109505,0.10:0.036664417789754995,0.073:0.009543544335363714,0.006913:0.0050293341354450535,0.007050:0.004031353935388081,0.003:0.0019163134919350053}
> >> 21540
> >> {0.04:0.4289840457590362,0.006913:0.3764552842292379,0.03:0.14201700033942147,0.025:0.031492533562460345,0.01:0.0057428959027208,0.0625:0.005563615996288134,0.007050:0.004410121345864598,0.02:0.003315679992182833,0.06:0.0010680133665222197,0.057:2.6908116668663575E-4}
> >> 21541
> >> {0:0.6323790020346274,0.02:0.282038438551802,0.06:0.05929917592183956,0.046:0.021922159161082488,0.025:0.003123559949176154,0.1:6.163861804777446E-4,0.057:1.1513607281652563E-4,0.077:7.546641269535658E-5,0.05:6.020069105221075E-5,0.04:5.727546417602751E-5}
> >> 21542
> >> {0.06:0.7916202902993545,0.003:0.12474538653424426,0.03:0.0516187501990535,0.073:0.013078206873952192,0.077:0.006874558859394474,0.1:0.004726257215175674,0.07:0.0029954110394770084,0.02:0.0015535221634909381,0.0625:8.057802584933225E-4,0.006913:5.349608183182145E-4}
> >> 21543
> >> {0.03:0.8557703918728101,0.007050:0.0884696514495358,0.01:0.02386217742025169,0.055:0.01794750983209314,0.046:0.007244240662980594,0.02:0.0025021610305199344,0.1:0.0011795344808501283,0.0625:9.555718731181631E-4,0.077:8.384080940465597E-4,0.003:2.4480111447900804E-4}
> >> 21544
> >> {0.006913:0.6497351267772595,0.03:0.2586364130498913,0.003:0.0845308091084703,0.05:0.003488921831506818,0.0625:0.0011991059579690732,0.06:5.301659535652308E-4,0.055:3.152013670552699E-4,0.025:3.11949418681945E-4,0.02:2.2346509541410544E-4,0.057:1.727741085612565E-4}
> >> 21545
> >> {0.07:0.24937001680778675,0.03:0.22854680579732564,0.1:0.22068070811382484,0.05:0.21891358916881656,0.06:0.04971205462818302,0.046:0.029384231252419384,0.025:0.0014118858333346275,0.0625:0.0012321257282372393,0.073:3.1606402052550454E-4,0.04:1.1662985389192366E-4}
> >> 21546
> >> {0.006913:0.652435612545713,0.073:0.24282539253890825,0.003:0.055020663101050894,0.046:0.01978650831133704,0.04:0.011699600233998459,0.02:0.006822518565048025,0.05:0.0042733514779345234,0.025:0.0020768247329981497,0.03:0.0015466188524926666,0:0.0010433788491090747}
> >> 21547
> >> {0.1:0.935072008706917,0.077:0.040616382672055494,0.046:0.023138054104883225,0.06:3.42252302356326E-4,0.03:2.672247473681925E-4,0.007050:9.004419280457053E-5,0.006913:7.844271097106069E-5,0.04:7.198835322717653E-5,0.073:5.8017696474244885E-5,0.02:4.474316852571109E-5}
> >> 21548
> >> {0.006913:0.4510141547325999,0.03:0.21472666934984877,0.07:0.15060570743672352,0.046:0.08216181853028293,0.05:0.07498297963542139,0.077:0.01401215532342401,0.04:0.006135722806477439,0.073:0.0031124469556872442,0.02:0.001327252389650958,0.025:4.5167404646311124E-4}
> >> 21549
> >> {0.077:0.5249260290096315,0:0.27520186965742544,0.073:0.08959570186504386,0.057:0.05063809804337512,0.02:0.045313417578343,0.03:0.00856024297303885,0.06:0.0034039136814002697,0.07:8.611323331226122E-4,0.05:3.9387255234958607E-4,0.04:3.454752498776842E-4}
> >> 21550
> >> {0.077:0.8319708524327014,0.007050:0.16001430652201168,0.0625:0.0024884528530674226,0.03:0.0024228964094551416,0.006913:0.0010048446080994332,0.025:5.974607168723063E-4,0.06:2.7741593377792194E-4,0.057:2.2975316663437597E-4,0.073:2.1769059479546598E-4,0.01:1.4945234676270913E-4}
> >> 21551
> >> {0:0.43298549949426596,0.07:0.30407459462158964,0.0625:0.15744077685124136,0.01:0.06385977925647884,0.025:0.020205557109867888,0.04:0.013226123091680062,0.055:0.006501562964287696,0.073:5.871897881404074E-4,0.003:4.554836644848264E-4,0.05:2.0489216962221778E-4}
> >> 21552
> >> {0.06:0.7383744333942458,0.02:0.07321126682102753,0.07:0.05910378841288311,0.10:0.056907223730939045,0:0.02739559786902668,0.055:0.02228913751272657,0.1:0.00943274247398869,0.073:0.007301445750018608,0.03:0.0027711985062277246,0.046:0.0022569760697531112}
> >> 21553
> >> {0.006913:0.9383779606743132,0.07:0.018356074893823605,0.1:0.017572502072403694,0.025:0.011031848455352145,0.02:0.007110054905474798,0.10:0.0032390933761145377,0.01:0.0022676909091530165,0.06:6.35712654913703E-4,0.003:3.037608224814005E-4,0.0625:2.569989005512836E-4}
> >> 21554
> >> {0.02:0.4740260172915081,0.03:0.3264871353578654,0.003:0.12383071192450323,0:0.0387913487693229,0.046:0.013922363892934853,0.01:0.01115832101415319,0.055:0.004438529958216764,0.006913:0.0031112721201723413,0.025:0.002399946628810436,0.073:6.131643125523258E-4}
> >> 21555
> >> {0.01:0.42396888624053325,0.057:0.2876207419392007,0.055:0.13436747721404915,0.10:0.0923627978966975,0.05:0.030530372962547347,0.003:0.008404234144369286,0.006913:0.007605013189108045,0.007050:0.005108825532219537,0.0625:0.004651573366090424,0.025:0.002180181204608079}
> >> 21556
> >> {0.02:0.9634685201058687,0.077:0.02816791882994464,0.003:0.007368001305747207,0.01:4.6299664334060455E-4,0.055:6.358811991095171E-5,0.06:5.5240076581247115E-5,0.03:5.231350505975146E-5,0.073:4.4773553407989E-5,0:3.6810457882911134E-5,0.006913:3.469265861451538E-5}
> >> 21557
> >> {0.06:0.9993947051352264,0.05:2.53296407549323E-4,0.03:9.647254237540585E-5,0.04:4.3439270433017595E-5,0.006913:3.354681491698619E-5,0.046:2.8586727566416525E-5,0.007050:2.5584606074089293E-5,0.02:2.2477243875316502E-5,0:1.9194783598477564E-5,0.073:1.6900267481892075E-5}
> >> 21558
> >> {0.06:0.9985073788555696,0.03:8.033091100049725E-4,0.02:1.1953766739610202E-4,0.1:9.257610652171745E-5,0.006913:6.903783269939522E-5,0.04:5.7960967280609926E-5,0.07:5.5767050956214925E-5,0.046:4.857027337508731E-5,0.007050:3.5964741541364354E-5,0.057:3.538185773175377E-5}
> >> 21559
> >> {0.006913:0.5411158302162348,0.073:0.10595054605908563,0.04:0.10013413310674449,0.007050:0.08891690362990352,0.003:0.08028744789933502,0.03:0.07779866329563544,0.025:0.0024314950846438975,0.0625:0.0018111845757907532,0.077:5.019763638644379E-4,0.055:2.6705221435486376E-4}
> >> 21560
> >> {0.06:0.9978884964462115,0.03:9.139179655096763E-4,0.02:6.066525825847198E-4,0.04:3.2209695809936266E-4,0.006913:7.970816960964983E-5,0.007050:5.58984012266038E-5,0.046:1.545206468939752E-5,0.077:1.5319891034946036E-5,0:1.5223193391279898E-5,0.073:1.1687759018690983E-5}
> >> 21561
> >> {0.06:0.8704787441462888,0.007050:0.09131090501970876,0.0625:0.03778566273654969,0.046:8.111987924033587E-5,0.01:6.397000919080148E-5,0.1:4.979138875277178E-5,0.073:2.7942853854174412E-5,0.03:2.762357950066146E-5,0.077:2.6820742114533983E-5,0.025:2.6208487691114472E-5}
> >> 21562
> >> {0.06:0.9625603310705717,0.02:0.01708933716171641,0.007050:0.011374975845817934,0.07:0.00482114999912725,0.057:0.003310531318631415,0.077:2.568457462720504E-4,0.025:1.3439876933758153E-4,0.1:1.0512433283405881E-4,0:8.737443941062146E-5,0.046:5.9284232059258864E-5}
> >> 21563
> >> {0.06:0.9996809273982157,0.046:1.6325482120709976E-4,0.02:2.1084598024339765E-5,0.006913:1.7901242019979392E-5,0.04:1.3991162886383784E-5,0.03:1.3682157013015017E-5,0.077:1.3602492456590212E-5,0.007050:1.206712606141859E-5,0.1:1.0231842284246997E-5,0.057:7.704725064510759E-6}
> >> 21564
> >> {0.06:0.9997861963976675,0.03:2.7140899197995276E-5,0.077:2.4456158311423386E-5,0.04:1.7952818233846462E-5,0.02:1.6334455815684533E-5,0.006913:1.4910843270220926E-5,0.073:1.4893553937733922E-5,0.1:1.2545291899719683E-5,0.007050:1.2337617858874285E-5,0.07:1.121302251254011E-5}
> >> 21565
> >> {0.006913:0.5047832315246878,0.007050:0.2502505818382197,0.04:0.09937533960784072,0.03:0.05332716291468396,0.0625:0.035738976624857435,0.05:0.023139962103851885,0.1:0.01510786357969295,0.025:0.01294816540331917,0.06:0.003589347327961106,0.073:4.8607269023994543E-4}
> >> 21566
> >> {0.03:0.865579490292393,0.073:0.065013560785593,0.077:0.056622094108767465,0.046:0.006811842330071251,0.057:0.0021561477140846267,0.01:0.0013149375957061502,0.04:7.456782721333958E-4,0.05:6.412995789267404E-4,0:2.107021879325011E-4,0.007050:1.8121393989190674E-4}
> >> 21567
> >> {0.077:0.6528663315309344,0.03:0.2794089480653573,0.025:0.060214953606503134,0.003:0.0029851203917978303,0.01:0.002840784719750811,0.007050:5.508511345707982E-4,0.02:3.3395642786457786E-4,0.07:1.7452781529689483E-4,0.055:1.541344869853217E-4,0.046:1.0154945247629696E-4}
> >> 21568
> >> {0.057:0.7925855379581803,0.03:0.2036907633660934,0.06:0.002283924010657722,0.046:0.0011125177332923534,0.10:6.706442563331911E-5,0.02:3.9070227131596934E-5,0.07:3.266094677087569E-5,0.1:2.3436639268605713E-5,0.077:2.318067689954084E-5,0.006913:2.273173553155518E-5}
> >> 21569
> >> {0.06:0.7223525910216753,0.1:0.22280373045161775,0.04:0.04239924319412595,0.02:0.006529556700876843,0.007050:0.004661124794787862,0.10:2.017974034648702E-4,0.05:2.017516118028694E-4,0.025:1.583677755896652E-4,0.006913:1.1939663934259253E-4,0:8.736457986006156E-5}
> >> 21570
> >> {0.073:0.5321098304788365,0.006913:0.3897385574581158,0.02:0.051125703897749404,0.077:0.012417023254098358,0.01:0.006491518762040415,0.03:0.005434610908750246,0.055:9.352347087701305E-4,0.06:5.239363525283659E-4,0.10:4.4114444135088393E-4,0.003:2.2360302221231853E-4}
> >> 21571
> >> {0.06:0.9074255414695478,0.05:0.08971808812931319,0.02:0.0019124104766371694,0.1:2.581234320438502E-4,0.073:1.9041498001195312E-4,0.046:8.348355382104383E-5,0.006913:5.748062878632201E-5,0.057:4.8979196235823963E-5,0.04:4.4005272512340306E-5,0.03:4.1089625609562384E-5}
> >> 21572
> >> {0.06:0.7216888928389846,0.04:0.1955684645266304,0.006913:0.08235454447065854,0.1:9.82899034505965E-5,0.046:4.9184577303445956E-5,0.05:4.021965070167039E-5,0.007050:3.081280652634891E-5,0.073:2.791145099471127E-5,0.02:1.95681856192452E-5,0:1.9037190007395713E-5}
> >> 21573
> >> {0.05:0.8568971411565196,0.046:0.12909436895238377,0.06:0.012704234652048044,0.04:2.472784666357729E-4,0.055:2.1886698996361582E-4,0.1:1.1974451444757112E-4,0.0625:9.082357988309755E-5,0.07:9.030017229129562E-5,0.03:7.268045692763623E-5,0.073:6.607926928741721E-5}
> >> 21574
> >> {0.046:0.5619466458628039,0.006913:0.184782367089353,0.0625:0.09726566772972363,0.003:0.09534816862353344,0.02:0.015692927163565275,0.073:0.015492017672231727,0.01:0.01411218625979968,0.007050:0.01037341031640615,0.055:0.0020124341216292752,0:0.0013828685922332715}
> >> 21575
> >> {0.05:0.3167681189235041,0.06:0.2879333280436204,0.046:0.22584628506521745,0.003:0.15994241628395953,0.0625:0.006482994028630967,0.1:0.0016773104050919493,0.055:4.8725015996676173E-4,0.03:2.432893903382962E-4,0.025:1.6015914759364425E-4,0.006913:1.0021292427951807E-4}
> >> 21576
> >> {0.077:0.2407816981967022,0.003:0.18594668103110193,0.10:0.14951775492012523,0.0625:0.09241152906714677,0.007050:0.08997645163280943,0.057:0.07102865286733068,0.055:0.048046579920457584,0.05:0.03776387140040494,0.073:0.033355081179026046,0.006913:0.015980584385115525}
> >> 21577
> >> {0.06:0.9599074612361259,0.1:0.03694135499501186,0.04:0.0021941315048273186,0.057:4.7070027358666304E-4,0.02:1.8231606308803002E-4,0.0625:6.234942557920162E-5,0.006913:5.159011604129561E-5,0.046:3.9832812943124216E-5,0.007050:3.0380683715134534E-5,0.05:2.8920851352755496E-5}
> >>
> >>
> >> *Streaming k-means:*
> >>
> >> [snip]
> >> INFO: Number of Centroids: 0
> >> Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
> >> WARNING: job_local23982482_0001
> >> java.lang.IllegalArgumentException: Must have nonzero number of training
> >> and test vectors. Asked for %.1f %% of %d vectors for test
> >> [10.000000149011612, 0]
> >>         at
> >> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
> >>         at
> >> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
> >>         at
> >> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
> >>         at
> >> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
> >>         at
> >> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
> >>         at
> >> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
> >>         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
> >>         at
> >> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
> >>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
> >>         at
> >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
> >>
> >> [snip]
> >>
> >> WARNING: No qualcluster.props found on classpath, will use command-line
> >> arguments only
> >> Num clusters: 0; maxDistance: 0.000000
> >> [Dunn Index] First: Infinity
> >> [Davies-Bouldin Index] First: NaN
> >> Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
> >> INFO: Program took 535 ms (Minutes: 0.008916666666666666)
> >> cluster,distance.mean,distance.sd
> >> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> >>
> >>
> >> On Tue, Jan 21, 2014 at 1:47 PM, Andrew Musselman <
> >> andrew.musselman@gmail.com> wrote:
> >>
> >>> *classify-20newsgroups.sh*
> >>>
> >>> *Complementary naive bayes:*
> >>> =======================================================
> >>> Summary
> >>> -------------------------------------------------------
> >>> Correctly Classified Instances          :      11207       98.9406%
> >>> Incorrectly Classified Instances        :        120        1.0594%
> >>> Total Classified Instances              :      11327
> >>>
> >>> =======================================================
> >>> Confusion Matrix
> >>> -------------------------------------------------------
> >>> a       b       c       d       e       f       g       h       i
> >>> j       k       l       m       n       o       p       q       r       s
> >>>    t        <--Classified as
> >>> 475     0       0       1       0       0       0       0       0
> >>> 0       0       0       0       0       1       0       1       0       0
> >>>    0         |  478         a     = alt.atheism
> >>> 0       597     1       1       0       1       1       0       0
> >>> 0       0       1       0       2       1       0       0       0       0
> >>>    0         |  605         b     = comp.graphics
> >>> 0       1       620     3       0       1       0       0       0
> >>> 0       0       1       0       0       1       0       0       0       0
> >>>    0         |  627         c     = comp.os.ms-windows.misc
> >>> 1       1       1       593     2       0       0       0       0
> >>> 0       0       0       0       0       0       1       0       0       0
> >>>    0         |  599         d     = comp.sys.ibm.pc.hardware
> >>> 0       1       1       0       568     0       1       0       0
> >>> 0       1       1       2       0       0       0       0       1       0
> >>>    0         |  576         e     = comp.sys.mac.hardware
> >>> 0       4       2       0       0       581     0       0       0
> >>> 0       0       0       0       0       0       0       0       0       0
> >>>    0         |  587         f     = comp.windows.x
> >>> 0       0       0       1       2       0       571     3       0
> >>> 0       1       1       4       1       0       0       0       0       0
> >>>    0         |  584         g     = misc.forsale
> >>> 0       0       0       1       0       0       0       589     1
> >>> 0       0       1       1       0       0       0       0       0       0
> >>>    0         |  593         h     = rec.autos
> >>> 0       0       0       0       0       0       0       1       565
> >>> 0       0       0       0       0       1       0       0       0       0
> >>>    0         |  567         i     = rec.motorcycles
> >>> 0       0       0       0       0       0       0       0       0
> >>> 600     2       0       0       0       1       0       0       0       0
> >>>    0         |  603         j     = rec.sport.baseball
> >>> 0       0       0       0       0       0       0       0       0
> >>> 1       584     0       0       0       0       0       0       0       0
> >>>    0         |  585         k     = rec.sport.hockey
> >>> 0       0       0       0       0       0       0       0       0
> >>> 0       0       579     0       0       0       0       0       1       0
> >>>    0         |  580         l     = sci.crypt
> >>> 0       0       0       1       3       0       2       0       0
> >>> 2       0       0       567     1       2       1       0       0       0
> >>>    0         |  579         m     = sci.electronics
> >>> 0       0       0       0       0       0       0       0       0
> >>> 0       0       0       1       605     0       0       0       0       0
> >>>    0         |  606         n     = sci.med
> >>> 0       0       0       0       0       0       0       0       0
> >>> 0       0       0       0       0       602     0       0       0       0
> >>>    0         |  602         o     = sci.space
> >>> 0       0       0       0       0       0       0       0       0
> >>> 0       0       0       0       1       0       602     0       0       1
> >>>    0         |  604         p     = soc.religion.christian
> >>> 0       0       0       0       0       0       0       0       0
> >>> 0       0       0       0       0       0       0       556     0       0
> >>>    0         |  556         q     = talk.politics.mideast
> >>> 0       0       1       0       0       0       0       0       0
> >>> 0       0       1       0       0       1       0       0       568     0
> >>>    0         |  571         r     = talk.politics.guns
> >>> 11      0       0       0       0       0       0       0       0
> >>> 1       0       0       0       1       3       8       1       4       338
> >>>    2         |  369         s     = talk.religion.misc
> >>> 0       0       0       0       0       0       0       0       0
> >>> 0       1       0       0       0       1       0       3       4       0
> >>>    447       |  456         t     = talk.politics.misc
> >>>
> >>> =======================================================
> >>> Statistics
> >>> -------------------------------------------------------
> >>> Kappa                                       0.9806
> >>> Accuracy                                   98.9406%
> >>> Reliability                                94.0932%
> >>> Reliability (standard deviation)            0.2163
> >>>
> >>> Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> INFO: Program took 15870 ms (Minutes: 0.2645)
> >>> + echo 'Testing on holdout set'
> >>> Testing on holdout set
> >>> + ./bin/mahout testnb -i /tmp/mahout-work-ec2-user/20news-test-vectors
> >>> -m /tmp/mahout-work-ec2-user/model -l /tmp/mahout-work-ec2-user/labelindex
> >>> -ow -o /tmp/mahout-work-ec2-user/20news-testing -c
> >>>
> >>> [snip]
> >>>
> >>> INFO: Complementary Results:
> >>> =======================================================
> >>> Summary
> >>> -------------------------------------------------------
> >>> Correctly Classified Instances          :       6715       89.3071%
> >>> Incorrectly Classified Instances        :        804       10.6929%
> >>> Total Classified Instances              :       7519
> >>>
> >>> =======================================================
> >>> Confusion Matrix
> >>> -------------------------------------------------------
> >>> a       b       c       d       e       f       g       h       i
> >>> j       k       l       m       n       o       p       q       r       s
> >>>    t        <--Classified as
> >>> 298     0       0       0       0       0       0       0       0
> >>> 1       0       0       0       1       2       5       1       0       13
> >>>     0         |  321         a     = alt.atheism
> >>> 0       298     11      6       1       12      2       2       1
> >>> 1       3       8       3       4       2       4       1       4       4
> >>>    1         |  368         b     = comp.graphics
> >>> 1       17      286     16      4       9       6       3       2
> >>> 0       1       0       1       7       1       0       2       1       0
> >>>    1         |  358         c     = comp.os.ms-windows.misc
> >>> 2       6       11      309     9       5       14      8       1
> >>> 0       2       0       6       4       2       0       1       2       1
> >>>    0         |  383         d     = comp.sys.ibm.pc.hardware
> >>> 0       10      8       7       334     7       5       5       2
> >>> 0       3       0       2       1       1       0       1       1       0
> >>>    0         |  387         e     = comp.sys.mac.hardware
> >>> 1       13      7       8       2       355     2       0       2
> >>> 0       0       5       1       1       3       0       0       1       0
> >>>    0         |  401         f     = comp.windows.x
> >>> 0       7       11      29      12      9       268     16      8
> >>> 4       3       2       6       4       2       1       3       1       2
> >>>    3         |  391         g     = misc.forsale
> >>> 0       1       0       0       3       0       7       362     8
> >>> 2       2       1       2       0       2       0       1       2       0
> >>>    4         |  397         h     = rec.autos
> >>> 0       0       0       1       0       0       1       0       423
> >>> 0       0       0       2       1       0       1       0       0       0
> >>>    0         |  429         i     = rec.motorcycles
> >>> 0       0       1       0       0       0       0       2       2
> >>> 371     8       0       2       3       0       2       0       0       0
> >>>    0         |  391         j     = rec.sport.baseball
> >>> 0       0       1       0       0       0       1       0       0
> >>> 2       409     0       0       0       0       0       0       0       0
> >>>    1         |  414         k     = rec.sport.hockey
> >>> 0       0       1       2       1       0       1       0       0
> >>> 0       0       404     0       0       0       0       0       1       0
> >>>    1         |  411         l     = sci.crypt
> >>> 0       5       4       11      1       3       7       9       2
> >>> 5       3       3       339     2       6       0       1       1       2
> >>>    1         |  405         m     = sci.electronics
> >>> 0       4       0       1       0       0       0       1       0
> >>> 1       1       0       3       367     3       1       2       0       0
> >>>    0         |  384         n     = sci.med
> >>> 0       1       2       0       1       0       2       0       0
> >>> 1       0       0       1       1       375     0       1       0       0
> >>>    0         |  385         o     = sci.space
> >>> 4       2       1       1       0       0       1       1       2
> >>> 0       0       1       1       5       1       367     4       0       1
> >>>    1         |  393         p     = soc.religion.christian
> >>> 0       1       0       0       0       0       0       0       0
> >>> 2       0       0       0       0       0       2       378     0       1
> >>>    0         |  384         q     = talk.politics.mideast
> >>> 0       0       0       0       0       2       1       1       1
> >>> 1       0       3       0       3       0       0       2       319     2
> >>>    4         |  339         r     = talk.politics.guns
> >>> 32      0       0       1       0       0       0       0       0
> >>> 1       1       1       0       2       2       26      5       7       175
> >>>    6         |  259         s     = talk.religion.misc
> >>> 0       0       0       2       0       0       0       0       0
> >>> 1       2       2       0       1       2       1       10      18      2
> >>>    278       |  319         t     = talk.politics.misc
> >>>
> >>> =======================================================
> >>> Statistics
> >>> -------------------------------------------------------
> >>> Kappa                                       0.8594
> >>> Accuracy                                   89.3071%
> >>> Reliability                                 84.611%
> >>> Reliability (standard deviation)            0.2148
> >>>
> >>> Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> INFO: Program took 10840 ms (Minutes: 0.18066666666666667)
> >>>
> >>>
> >>> *Naive bayes:*
> >>> INFO: Standard NB Results:
> >>> =======================================================
> >>> Summary
> >>> -------------------------------------------------------
> >>> Correctly Classified Instances          :      11286       99.0869%
> >>> Incorrectly Classified Instances        :        104        0.9131%
> >>> Total Classified Instances              :      11390
> >>>
> >>> =======================================================
> >>> Confusion Matrix
> >>> -------------------------------------------------------
> >>> a       b       c       d       e       f       g       h       i
> >>> j       k       l       m       n       o       p       q       r       s
> >>>    t        <--Classified as
> >>> 474     0       0       0       0       0       0       0       0
> >>> 0       0       0       0       0       0       0       0       0       2
> >>>    1         |  477         a     = alt.atheism
> >>> 0       566     0       2       0       1       0       0       0
> >>> 0       0       0       0       0       0       0       0       0       0
> >>>    0         |  569         b     = comp.graphics
> >>> 0       10      590     29      2       4       1       0       0
> >>> 0       0       0       1       0       0       0       0       0       0
> >>>    1         |  638         c     = comp.os.ms-windows.misc
> >>> 0       0       0       596     0       0       0       0       0
> >>> 0       0       0       0       0       0       0       0       0       0
> >>>    0         |  596         d     = comp.sys.ibm.pc.hardware
> >>> 0       0       0       0       575     0       1       0       0
> >>> 0       0       0       1       0       0       0       0       0       0
> >>>    0         |  577         e     = comp.sys.mac.hardware
> >>> 0       2       2       2       0       593     1       0       0
> >>> 0       0       0       0       0       1       0       0       0       0
> >>>    0         |  601         f     = comp.windows.x
> >>> 0       0       0       1       0       0       589     1       0
> >>> 0       1       0       2       0       0       0       0       0       0
> >>>    0         |  594         g     = misc.forsale
> >>> 0       0       0       0       0       0       0       594     0
> >>> 0       0       0       0       0       0       0       0       0       0
> >>>    0         |  594         h     = rec.autos
> >>> 0       0       0       0       0       0       0       0       611
> >>> 0       0       0       0       0       0       0       0       0       0
> >>>    0         |  611         i     = rec.motorcycles
> >>> 0       0       0       0       0       0       0       0       0
> >>> 616     1       0       0       0       0       0       0       0       0
> >>>    0         |  617         j     = rec.sport.baseball
> >>> 0       0       0       0       0       0       1       0       0
> >>> 0       620     0       0       0       0       0       0       0       0
> >>>    0         |  621         k     = rec.sport.hockey
> >>> 0       0       0       0       0       0       0       0       0
> >>> 0       0       580     0       0       0       0       0       1       0
> >>>    0         |  581         l     = sci.crypt
> >>> 0       0       0       3       1       0       0       0       0
> >>> 0       0       0       571     0       0       0       0       0       0
> >>>    0         |  575         m     = sci.electronics
> >>> 0       0       0       0       0       0       0       0       0
> >>> 0       0       0       2       583     0       0       0       0       0
> >>>    0         |  585         n     = sci.med
> >>> 0       0       0       0       0       0       0       0       0
> >>> 0       0       0       0       1       599     0       0       0       0
> >>>    0         |  600         o     = sci.space
> >>> 0       1       0       0       0       0       0       0       0
> >>> 0       0       0       0       0       0       615     0       0       0
> >>>    0         |  616         p     = soc.religion.christian
> >>> 1       0       0       0       0       0       0       0       0
> >>> 0       0       0       0       0       0       1       560     0       0
> >>>    0         |  562         q     = talk.politics.mideast
> >>> 0       0       1       0       0       0       0       0       0
> >>> 0       0       1       0       0       0       0       0       548     0
> >>>    1         |  551         r     = talk.politics.guns
> >>> 10      0       0       0       0       0       0       0       0
> >>> 0       0       0       0       0       1       1       0       2       344
> >>>    1         |  359         s     = talk.religion.misc
> >>> 0       0       0       0       0       0       0       0       0
> >>> 0       0       1       1       0       0       0       0       2       0
> >>>    462       |  466         t     = talk.politics.misc
> >>>
> >>> =======================================================
> >>> Statistics
> >>> -------------------------------------------------------
> >>> Kappa                                       0.9847
> >>>  Accuracy                                   99.0869%
> >>> Reliability                                94.3334%
> >>> Reliability (standard deviation)            0.2169
> >>>
> >>> Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> INFO: Program took 14304 ms (Minutes: 0.2384)
> >>> + echo 'Testing on holdout set'
> >>> Testing on holdout set
> >>>
> >>> [snip]
> >>>
> >>> INFO: Standard NB Results:
> >>> =======================================================
> >>> Summary
> >>> -------------------------------------------------------
> >>> Correctly Classified Instances          :       6718       90.1019%
> >>> Incorrectly Classified Instances        :        738        9.8981%
> >>> Total Classified Instances              :       7456
> >>>
> >>> =======================================================
> >>> Confusion Matrix
> >>> -------------------------------------------------------
> >>> a       b       c       d       e       f       g       h       i
> >>> j       k       l       m       n       o       p       q       r       s
> >>>    t        <--Classified as
> >>> 294     0       0       0       0       0       0       0       0
> >>> 0       0       2       0       1       1       6       1       1       16
> >>>     0         |  322         a     = alt.atheism
> >>> 0       345     6       14      6       11      6       0       0
> >>> 0       0       5       7       1       3       0       0       0       0
> >>>    0         |  404         b     = comp.graphics
> >>> 2       29      177     78      22      19      9       1       0
> >>> 0       0       4       2       0       1       1       0       0       1
> >>>    1         |  347         c     = comp.os.ms-windows.misc
> >>> 1       9       2       335     18      2       10      0       0
> >>> 0       1       0       8       0       0       0       0       0       0
> >>>    0         |  386         d     = comp.sys.ibm.pc.hardware
> >>> 1       4       2       13      347     3       5       1       0
> >>> 0       1       0       7       1       0       0       0       1       0
> >>>    0         |  386         e     = comp.sys.mac.hardware
> >>> 0       20      0       4       0       352     4       0       0
> >>> 0       0       0       1       1       3       0       1       0       1
> >>>    0         |  387         f     = comp.windows.x
> >>> 0       2       0       21      5       1       323     7       2
> >>> 2       0       2       12      0       3       0       0       0       0
> >>>    1         |  381         g     = misc.forsale
> >>> 0       1       0       0       1       0       15      363     8
> >>> 1       0       0       4       1       0       0       0       1       0
> >>>    1         |  396         h     = rec.autos
> >>> 0       1       0       0       0       0       6       6       370
> >>> 0       0       0       0       1       0       0       0       0       1
> >>>    0         |  385         i     = rec.motorcycles
> >>> 1       0       0       1       1       0       2       1       2
> >>> 362     5       0       2       0       0       0       0       0       0
> >>>    0         |  377         j     = rec.sport.baseball
> >>> 0       0       0       1       2       0       0       0       0
> >>> 3       371     0       0       0       0       0       0       0       0
> >>>    1         |  378         k     = rec.sport.hockey
> >>> 0       3       1       0       1       0       2       0       0
> >>> 0       0       396     0       1       0       0       1       1       1
> >>>    3         |  410         l     = sci.crypt
> >>> 0       7       0       7       7       2       6       4       0
> >>> 0       0       1       369     2       2       0       0       0       0
> >>>    2         |  409         m     = sci.electronics
> >>> 0       3       0       2       1       0       2       0       0
> >>> 0       0       1       4       383     4       0       0       1       0
> >>>    4         |  405         n     = sci.med
> >>> 0       5       0       0       1       0       3       0       0
> >>> 0       0       0       1       0       374     1       0       0       1
> >>>    1         |  387         o     = sci.space
> >>> 6       2       0       1       1       0       0       1       0
> >>> 1       0       0       1       5       0       352     2       1       7
> >>>    1         |  381         p     = soc.religion.christian
> >>> 1       1       0       0       0       0       0       0       0
> >>> 0       1       0       0       0       0       0       373     1       0
> >>>    1         |  378         q     = talk.politics.mideast
> >>> 0       0       0       0       0       0       1       0       1
> >>> 0       0       2       0       0       0       0       0       346     2
> >>>    7         |  359         r     = talk.politics.guns
> >>> 26      1       0       1       0       0       0       2       0
> >>> 1       1       0       0       1       1       20      2       6       200
> >>>    7         |  269         s     = talk.religion.misc
> >>> 1       0       0       0       0       0       0       2       0
> >>> 0       1       0       0       2       2       0       1       14      0
> >>>    286       |  309         t     = talk.politics.misc
> >>>
> >>> =======================================================
> >>> Statistics
> >>> -------------------------------------------------------
> >>> Kappa                                       0.8726
> >>> Accuracy                                   90.1019%
> >>> Reliability                                85.4491%
> >>> Reliability (standard deviation)            0.2222
> >>>
> >>> Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> INFO: Program took 10878 ms (Minutes: 0.1813)
> >>>
> >>> *SGD:*
> >>> 7532 test files
> >>>
> >>> =======================================================
> >>> Summary
> >>> -------------------------------------------------------
> >>> Correctly Classified Instances          :       5649            75%
> >>> Incorrectly Classified Instances        :       1883            25%
> >>> Total Classified Instances              :       7532
> >>>
> >>> =======================================================
> >>> Confusion Matrix
> >>> -------------------------------------------------------
> >>> a       b       c       d       e       f       g       h       i
> >>> j       k       l       m       n       o       p       q       r       s
> >>>    t        <--Classified as
> >>> 186     6       3       10      5       0       33      4       13
> >>>  15      7       1       24      15      3       15      5       5       29
> >>>     15        |  394         a     = sci.space
> >>> 5       309     0       3       2       5       0       0       0
> >>> 1       9       21      2       0       0       18      4       4       1
> >>>    1         |  385         b     = comp.sys.mac.hardware
> >>> 4       1       101     3       0       1       63      0       7
> >>> 0       1       1       5       16      3       0       3       7       1
> >>>    34        |  251         c     = talk.religion.misc
> >>> 11      12      1       265     1       10      3       0       0
> >>> 17      10      11      5       2       0       11      3       6       21
> >>>     0         |  389         d     = comp.graphics
> >>> 2       1       1       0       349     2       3       0       3
> >>> 2       6       1       5       1       0       2       15      2       1
> >>>    2         |  398         e     = rec.motorcycles
> >>> 7       20      3       19      2       254     6       0       2
> >>> 11      2       39      7       2       0       4       2       2       9
> >>>    3         |  394         f     = comp.os.ms-windows.misc
> >>> 2       1       13      0       0       0       247     0       1
> >>> 1       3       0       6       2       4       0       2       3       5
> >>>    29        |  319         g     = alt.atheism
> >>> 1       1       0       0       2       0       2       361     0
> >>> 1       2       0       2       0       0       1       3       22      0
> >>>    1         |  399         h     = rec.sport.hockey
> >>> 3       0       3       1       0       0       5       0       161
> >>> 0       1       2       12      102     0       0       1       2       11
> >>>     6         |  310         i     = talk.politics.misc
> >>> 2       8       0       19      0       19      0       0       1
> >>> 294     10      11      4       2       0       5       0       3       11
> >>>     6         |  395         j     = comp.windows.x
> >>> 2       10      0       1       1       0       0       0       0
> >>> 1       347     13      2       1       0       5       3       2       2
> >>>    0         |  390         k     = misc.forsale
> >>> 1       36      0       6       1       25      0       0       1
> >>> 6       10      257     2       1       0       34      6       0       6
> >>>    0         |  392         l     = comp.sys.ibm.pc.hardware
> >>> 2       2       2       2       1       0       12      0       0
> >>> 6       10      4       312     5       2       13      11      3       3
> >>>    6         |  396         m     = sci.med
> >>> 2       0       3       2       1       0       0       1       13
> >>>  0       5       1       2       314     2       0       2       2       10
> >>>     4         |  364         n     = talk.politics.guns
> >>> 1       0       2       1       1       0       34      1       33
> >>>  1       3       0       1       8       271     1       4       5       6
> >>>      3         |  376         o     = talk.politics.mideast
> >>> 3       14      0       8       2       8       3       1       1
> >>> 7       12      29      6       2       1       245     13      2       32
> >>>     4         |  393         p     = sci.electronics
> >>> 3       3       0       2       11      0       1       0       2
> >>> 1       11      6       4       2       0       11      330     4       4
> >>>    1         |  396         q     = rec.autos
> >>> 0       0       1       0       1       0       4       12      3
> >>> 1       3       0       0       0       0       5       6       359     1
> >>>    1         |  397         r     = rec.sport.baseball
> >>> 0       1       0       0       0       1       0       0       3
> >>> 3       0       0       3       2       1       6       1       6       366
> >>>    3         |  396         s     = sci.crypt
> >>> 0       2       11      1       1       0       40      0       1
> >>> 2       3       4       2       1       0       5       0       2       2
> >>>    321       |  398         t     = soc.religion.christian
> >>>
> >>> =======================================================
> >>> Statistics
> >>> -------------------------------------------------------
> >>> Kappa                                       0.7073
> >>> Accuracy                                        75%
> >>> Reliability                                70.6238%
> >>> Reliability (standard deviation)            0.2187
> >>> Log-likelihood                mean      :    -1.1182
> >>>                               25%-ile   :    -1.6911
> >>>                               75%-ile   :    -0.0803
> >>>
> >>> Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> INFO: Program took 10783 ms (Minutes: 0.17971666666666666)
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <su...@yahoo.com>wrote:
> >>>
> >>>> Thanks Andrew for reporting that. I rolled back the release to fix this
> >>>> and few other issues.
> >>>>
> >>>> We have removed asf-examples*.sh from trunk as the sample file at the
> >>>> url mentioned in ur email is not available.
> >>>> This is something we need to fix and restore in 1.0.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <
> >>>> ap.dev@outlook.com> wrote:
> >>>>
> >>>> from the asf-email-examples.sh script:
> >>>>
> >>>> # You will need to download or otherwise obtain some or all of the
> >>>> Amazon ASF Em
> >>>> ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566)
> >>>> to use this
> >>>> script.
> >>>> # To obtain a full copy you will need to launch an EC2 instance and
> >>>> mount the da
> >>>> taset to download it, otherwise you can get a sample of it at
> >>>> #
> >>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
> >>>>
> >>>> It looks like the:
> >>>>
> >>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
> >>>>
> >>>> link is down.
> >>>>
> >>>> Is there somewhere else that we can get a subset of the ASF emails?
> >>>>
> >>>>
> >>>>
> >>>> Date: Tue, 21 Jan 2014 09:48:06 -0800
> >>>> > Subject: Re: MAHOUT 0.9 Release - New URL
> >>>> > From: andrew.musselman@gmail.com
> >>>> > To: dev@mahout.apache.org
> >>>> >
> >>>> > Sure thing; continuing to smoke test the other examples tonight
> >>>> >
> >>>> >
> >>>> > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <
> >>>> suneel_marthi@yahoo.com>wrote:
> >>>> >
> >>>> > > Thanks Andrew M., see that some of the example scripts need to be
> >>>> fixed as
> >>>> > > they still refer to the deprecated algorithms.
> >>>> > > See that the Streaming KMeans has failed for you as well.
> >>>> > >
> >>>> > > I'll be rolling back the release today to fix these issues.
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
> >>>> > > andrew.musselman@gmail.com> wrote:
> >>>> > >
> >>>> > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default
> >>>> 64-bit
> >>>> > > Linux AMI from tarball.
> >>>> > >
> >>>> > > All tests pass.
> >>>> > >
> >>>> > > *Output of examples:*
> >>>> > > *asf-email-examples.sh, run on mahout.apache.org
> >>>> > > <http://mahout.apache.org>:*
> >>>> > > *recommendations:*
> >>>> > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
> >>>> > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000  | less
> >>>> > > 1
> >>>> > >
> >>>> > >
> >>>> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
> >>>> > > 4
> >>>> > >
> >>>> > >
> >>>> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
> >>>> > > 6
> >>>> > >
> >>>> > >
> >>>> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
> >>>> > > 8
> >>>> > >     [12758:1.0,19409:1.0,11112:1.0]
> >>>> > > 11
> >>>> > >
> >>>> > >
> >>>> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
> >>>> > > 14
> >>>> > >
> >>>> > >
> >>>> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
> >>>> > > 15
> >>>> > >
> >>>> > >
> >>>> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
> >>>> > > 16
> >>>> > >
> >>>> > >
> >>>> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
> >>>> > > 18
> >>>> > >
> >>>> > >
> >>>> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
> >>>> > > 19      [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
> >>>> > > 20
> >>>> > >
> >>>> > >
> >>>> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
> >>>> > > [snip]
> >>>> > >
> >>>> > > *clustering; kmeans:*
> >>>> > > [snip]
> >>>> > >         Weight : [props - optional]:  Point:
> >>>> > >         1.0 :
> >>>> > >  [distance-squared=1.0193102046188427]:
> >>>> > > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus=
> >>>> > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
> >>>> 7573:0.204,
> >>>> > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093,
> >>>> 9779:0.159,
> >>>> > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
> >>>> > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
> >>>> > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
> >>>> > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
> >>>> > > 39789:0.110, 40743:0.190, 45775:0.086]
> >>>> > >         1.0 : [distance-squared=0.9823018320457279]:
> >>>> > > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus=
> >>>> > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
> >>>> 5336:0.106,
> >>>> > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173,
> >>>> 7832:0.072,
> >>>> > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
> >>>> > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
> >>>> > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
> >>>> > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
> >>>> > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
> >>>> > >         1.0 : [distance-squared=0.9509142993214911]:
> >>>> > >
> >>>> /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
> >>>> > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
> >>>> > >  4419:0.076,
> >>>> > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056,
> >>>> 7235:0.048,
> >>>> > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123,
> >>>> 7683:0.077,
> >>>> > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
> >>>> 10225:0.081,
> >>>> > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
> >>>> > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
> >>>> > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
> >>>> > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
> >>>> > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
> >>>> > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
> >>>> > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
> >>>> > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
> >>>> > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
> >>>> > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
> >>>> > > 41280:0.065, 41696:0.072, 41947:0.118,
> >>>> > >  43685:0.086, 44077:0.308,
> >>>> > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
> >>>> > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
> >>>> > > [snip]
> >>>> > >
> >>>> > > *clustering; dirichlet:*
> >>>> > > Get this complaint:
> >>>> > > Running Dirichlet with K = 8
> >>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> >>>> > > HADOOP_CONF_DIR=
> >>>> > > MAHOUT-JOB:
> >>>> > >
> >>>> > >
> >>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> >>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class:
> >>>> dirichlet
> >>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props
> >>>> found on
> >>>> > > classpath, will use command-line arguments only
> >>>> > > Unknown program 'dirichlet' chosen.
> >>>> > >
> >>>> > > *clustering: minhash:*
> >>>> > > Running Minhash
> >>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> >>>> > > HADOOP_CONF_DIR=
> >>>> > > MAHOUT-JOB:
> >>>> > >
> >>>> > >
> >>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> >>>> > > 14/01/21 05:17:27 WARN
> >>>> > >  driver.MahoutDriver: Unable to add class: minhash
> >>>> > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found
> >>>> on
> >>>> > > classpath, will use command-line arguments only
> >>>> > > Unknown program 'minhash' chosen.
> >>>> > >
> >>>> > > *classification; standard:*
> >>>> > > =======================================================
> >>>> > > Summary
> >>>> > > -------------------------------------------------------
> >>>> > > Correctly Classified Instances          :       5384       87.7874%
> >>>> > > Incorrectly Classified Instances        :        749       12.2126%
> >>>> > > Total Classified Instances              :       6133
> >>>> > >
> >>>> > > =======================================================
> >>>> > > Confusion Matrix
> >>>> > > -------------------------------------------------------
> >>>> > > a       b       c       d
> >>>> > >     <--Classified as
> >>>> > > 2949    7       531     25       |  3512        a     = dev
> >>>> > > 0       0       0       0        |  0           b     = general
> >>>> > > 99      8       1763    8        |  1878        c     = user
> >>>> > > 41      1       29      672      |  743         d     = commits
> >>>> > >
> >>>> > > =======================================================
> >>>> > > Statistics
> >>>> > > -------------------------------------------------------
> >>>> > > Kappa
> >>>> > >  0.7877
> >>>> > > Accuracy                                   87.7874%
> >>>> > > Reliability                                 53.658%
> >>>> > > Reliability (standard deviation)            0.4911
> >>>> > >
> >>>> > > *classification; complementary:*
> >>>> > > =======================================================
> >>>> > > Summary
> >>>> > > -------------------------------------------------------
> >>>> > > Correctly Classified Instances          :       5530       90.1679%
> >>>> > > Incorrectly Classified Instances        :        603        9.8321%
> >>>> > > Total Classified Instances              :
> >>>> > >  6133
> >>>> > >
> >>>> > > =======================================================
> >>>> > > Confusion Matrix
> >>>> > > -------------------------------------------------------
> >>>> > > a       b       c       d       <--Classified as
> >>>> > > 3168    0       276     68       |  3512        a     = dev
> >>>> > > 0       0       0       0        |  0           b     = general
> >>>> > > 196     0       1652    30       |  1878        c     = user
> >>>> > > 25      0       8       710      |  743         d     =
> >>>> > >  commits
> >>>> > >
> >>>> > > =======================================================
> >>>> > > Statistics
> >>>> > > -------------------------------------------------------
> >>>> > > Kappa                                       0.8259
> >>>> > > Accuracy                                   90.1679%
> >>>> > > Reliability                                54.7459%
> >>>> > > Reliability (standard deviation)            0.5005
> >>>> > >
> >>>> > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms
> >>>> (Minutes:
> >>>> > > 0.34836666666666666)
> >>>> > >
> >>>> > > *classification; sgd, with three categories:*
> >>>> > > Running SGD Training
> >>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
> >>>> > >  and
> >>>> > > HADOOP_CONF_DIR=
> >>>> > > MAHOUT-JOB:
> >>>> > >
> >>>> > >
> >>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> >>>> > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
> >>>> > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
> >>>> classpath,
> >>>> > > will use command-line arguments only
> >>>> > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
> >>>> > > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
> >>>> > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
> >>>> > > --output=[asf-output/classification/sgd/models], --poolSize=[5],
> >>>> > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
> >>>> > > 24168 training files
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       1
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
> >>>> > >  2
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       3
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       4
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       6
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       8
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       10
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00
> >>>> > >    0.00    0.00    0.0000000       0.0000000       12
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       15
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       20
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       25
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       30
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000
> >>>> > >     0.0000000       40
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       50
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       60
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       70
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       80
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       100
> >>>> > > 0.000
> >>>> > >  0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       120
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       140
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       150
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       200
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       250
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00
> >>>> > >  0.00    0.00    0.0000000       0.0000000       300
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       400
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       500
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       600
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       700
> >>>> > > 0.000   0.00    none
> >>>> > > 0.00    0.00    0.00    0.00    0.0000000
> >>>> > >  0.0000000       800
> >>>> > > 0.000   0.00    none
> >>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> >>>> > > 1.0019413e-08   1000    -0.607  75.78   none
> >>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> >>>> > > 1.0019413e-08   1200    -0.607  75.78   none
> >>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> >>>> > > 1.0019413e-08   1400    -0.607  75.78   none
> >>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> >>>> > > 1.0019413e-08   1500    -0.607  75.78   none
> >>>> > > 0.24    43686.00        17924.00        329.50
> >>>> > >  1.0571799e-08
> >>>> > > 1.0032261e-08   2000    -0.487  82.65   none
> >>>> > > 0.24    49753.00        21610.00        330.71  1.3770070e-08
> >>>> > > 1.0011902e-08   2500    -0.439  83.90   none
> >>>> > > 0.24    49753.00        21610.00        330.71  1.3770070e-08
> >>>> > > 1.0011902e-08   3000    -0.439  83.90   none
> >>>> > > 0.32    50635.00        28531.00        437.09  1.0551175e-08
> >>>> > > 1.0000001e-08   4000    -0.351  88.14   none
> >>>> > > 0.32    50635.00        32642.00        437.09  1.0551175e-08
> >>>> > > 1.0000000e-08   5000    -0.378  87.10   none
> >>>> > > 0.32    50635.00        36461.00        437.09
> >>>> > >  1.0556652e-08
> >>>> > > 1.0000001e-08   6000    -0.372  86.89   none
> >>>> > > 0.32    50635.00        37768.00        437.09  1.0576742e-08
> >>>> > > 1.0000001e-08   7000    -0.334  89.26   none
> >>>> > > 0.32    50635.00        38807.00        437.09  1.0576742e-08
> >>>> > > 1.0000000e-08   8000    -0.368  87.52   none
> >>>> > > 0.32    50635.00        44731.00        437.09  1.0576716e-08
> >>>> > > 1.0000000e-08   10000   -0.374  87.39   none
> >>>> > > 0.32    50635.00        45672.00        437.09  1.0576716e-08
> >>>> > > 1.0000000e-08   12000   -0.298  88.26   none
> >>>> > > Exception in thread "main" java.lang.IllegalStateException:
> >>>> > > java.lang.ArrayIndexOutOfBoundsException:
> >>>> > >  2
> >>>> > >         at
> >>>> > >
> >>>> > >
> >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
> >>>> > >         at
> >>>> > >
> >>>> > >
> >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
> >>>> > >         at
> >>>> > >
> >>>> > >
> >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
> >>>> > >         at
> >>>> > >
> >>>> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
> >>>> > >         at
> >>>> > >
> >>>> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
> >>>> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> >>>> Method)
> >>>> > >         at
> >>>> > >
> >>>> > >
> >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >>>> > >
> >>>> > >  at
> >>>> > >
> >>>> > >
> >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >>>> > >         at java.lang.reflect.Method.invoke(Method.java:622)
> >>>> > >         at
> >>>> > >
> >>>> > >
> >>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> >>>> > >         at
> >>>> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> >>>> > >         at
> >>>> > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> >>>> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> >>>> Method)
> >>>> > >         at
> >>>> > >
> >>>> > >
> >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >>>> > >         at
> >>>> > >
> >>>> > >
> >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >>>> > >         at java.lang.reflect.Method.invoke(Method.java:622)
> >>>> > >         at
> >>>> > >  org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> >>>> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
> >>>> > >         at
> >>>> > > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
> >>>> > >         at
> >>>> > >
> >>>> > >
> >>>> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
> >>>> > >         at
> >>>> > >
> >>>> > >
> >>>> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
> >>>> > >         at
> >>>> > >
> >>>> > >
> >>>> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
> >>>> > >         at
> >>>> > >
> >>>> > >
> >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
> >>>> > >         at
> >>>> > >
> >>>> > >
> >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
> >>>> > >
> >>>> > >  at
> >>>> > >
> >>>> > >
> >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
> >>>> > >         at
> >>>> > >
> >>>> > >
> >>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
> >>>> > >         at
> >>>> > >
> >>>> > >
> >>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
> >>>> > >         at
> >>>> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >>>> > >         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >>>> > >         at
> >>>> > >
> >>>> > >
> >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> >>>> > >         at
> >>>> > >
> >>>> > >
> >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>>> > >         at java.lang.Thread.run(Thread.java:701)
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
> >>>> > > andrew.musselman@gmail.com> wrote:
> >>>> > >
> >>>> > > > Trying out the build today
> >>>> > > >
> >>>> > > >
> >>>> > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
> >>>> suneel_marthi@yahoo.com
> >>>> > > >wrote:
> >>>> > > >
> >>>> > > >> This is an issue (trivial one though) that needs to be fixed for
> >>>> 0.9
> >>>> > > >> Release, will be rerolling the release today (in the next few
> >>>> hrs) and
> >>>> > > >> putting out a new release candidate in staging.
> >>>> > > >>
> >>>> > > >> Thanks for reporting this Andrew P.
> >>>> > > >>
> >>>> > > >>
> >>>> > > >>
> >>>> > > >>
> >>>> > > >>
> >>>> > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
> >>>> > > ap.dev@outlook.com>
> >>>> > > >> wrote:
> >>>> > > >>
> >>>> > > >> I ran through the tests with on a CentOS VM
> >>>> > >  AMD64 2 cores 4 GB RAM.  Had
> >>>> > > >> a bit of trouble getting the Hadoop natives to compile and
> >>>> therefore may
> >>>> > > >> have run into some problems because of the hadoop setup.  Ran
> >>>> into some
> >>>> > > >> problems in the example scripts.  Particularly with
> >>>> > > >> ./cluster-syntheticcontrol.sh ->4,5.  I will run through the
> >>>> rest of the
> >>>> > > >> examples when im sure I've got hadoop setup right.
> >>>> > > >>
> >>>> > > >>
> >>>> > > >> Apache Maven 3.1.2-SNAPSHOT
> >>>> > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> >>>> > > >> Java home: /usr/java/jdk1.6.0_45/jre
> >>>> > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch:
> >>>> "amd64",
> >>>> > > >> family: "unix"
> >>>> > > >> $MAHOUT_LOCAL=true
> >>>> > > >> Hadoop 2.2.0
> >>>> > > >>
> >>>> > > >>
> >>>> > > >> a) Verify that u can unpack the release (tar or zip) ...passed
> >>>> (tar)
> >>>> > > >> [passed ]
> >>>> > > >>
> >>>> > > >> b) Verify u r able to compile the
> >>>> > >  distro
> >>>> > > >>
> >>>> > > >>     mvn compile- [passed with warnings]
> >>>> > > >>
> >>>> > > >>     [WARNING]  Expected all dependencies to require Scala
> >>>> version: 2.9.3
> >>>> > > >>     [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires
> >>>> scala
> >>>> > > >> version: 2.9.3
> >>>> > > >>     [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala
> >>>> > > >> version: 2.9.2
> >>>> > > >>     [WARNING] Multiple versions of scala libraries detected!
> >>>> > > >>
> >>>> > > >> c)  Run through the unit tests: mvn clean test
> >>>> > > >>     mvn clean test [passed]
> >>>> > > >>
> >>>> > > >> d) Run the
> >>>> > > >>  example scripts under $MAHOUT_HOME/examples/bin.
> >>>> > > >> Please run through all the different options in each script
> >>>> > > >>
> >>>> > > >>     Running example scripts with $MAHOUT_LOCAL=true
> >>>> > > >>
> >>>> > > >>
> >>>> > >  ./cluster-syntheticcontrol.sh ->1 [works]
> >>>> > > >>     ./cluster-syntheticcontrol.sh ->2 [works]
> >>>> > > >>     ./cluster-syntheticcontrol.sh ->3 [works]
> >>>> > > >>
> >>>> > > >>
> >>>> > > >>     ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
> >>>> > > >>     [...]
> >>>> > > >>     WARNING: Unable to add class:
> >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> >>>> > > >>     java.lang.ClassNotFoundException:
> >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> >>>> > > >>         at
> >>>> > > >>  java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> >>>> > > >>         at java.security.AccessController.doPrivileged(Native
> >>>> Method)
> >>>> > > >>         at
> >>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> >>>> > > >>         at
> >>>> > >  java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> >>>> > > >>         at
> >>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> >>>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> >>>> > > >>         at java.lang.Class.forName0(Native Method)
> >>>> > > >>         at java.lang.Class.forName(Class.java:171)
> >>>> > > >>         at
> >>>> > > >>
> >>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> >>>> > > >>         at
> >>>> > > >>
> >>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> >>>> > > >>     Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
> >>>> > > >>
> >>>> > > >>
> >>>> > > >>     ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
> >>>> > > >>
> >>>> > > >>     WARNING: Unable to add class:
> >>>> > > >>
> >>>> > >  org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> >>>> > > >>     java.lang.ClassNotFoundException:
> >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> >>>> > > >>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> >>>> > > >>         at java.security.AccessController.doPrivileged(Native
> >>>> Method)
> >>>> > > >>         at
> >>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> >>>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> >>>> > > >>         at
> >>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> >>>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> >>>> > > >>         at java.lang.Class.forName0(Native Method)
> >>>> > > >>         at
> >>>> > >  java.lang.Class.forName(Class.java:171)
> >>>> > > >>         at
> >>>> > > >>
> >>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> >>>> > > >>         at
> >>>> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> >>>> > > >>     Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
> >>>> > > >>     WARNING: No
> >>>> > > >>
> >>>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found
> >>>> > > on
> >>>> > > >> classpath, will use command-line arguments only
> >>>> > > >>     Unknown program
> >>>> > > >>  'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
> >>>> chosen.
> >>>> > > >>
> >>>> > > >>
> >>>> > > >>     ./classify-20newsgroups.sh ->1 [works]
> >>>> > > >>     ./classify-20newsgroups.sh ->2 [works]
> >>>> > > >>
> >>>> > > >>
> >>>> > > >>     cluster-reuters.sh ->1 [works]
> >>>> > > >>
> >>>> > >  cluster-reuters.sh ->2 [works]
> >>>> > > >>     cluster-reuters.sh ->3 [works]
> >>>> > > >>
> >>>> > > >>     Same error as noted previosly in the thread:
> >>>> > > >>
> >>>> > > >>     cluster-reuters.sh ->4 [0 clusters]
> >>>> > > >>
> >>>> > > >>     [...]
> >>>> > > >>
> >>>> > > >>     WARNING: No qualcluster.props found on classpath, will use
> >>>> > > >> command-line arguments only
> >>>> > > >>     Num clusters: 0; maxDistance: 0.000000
> >>>> > > >>     [Dunn Index]
> >>>> > > >>  First: Infinity
> >>>> > > >>     [Davies-Bouldin Index] First: NaN
> >>>> > > >>     Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
> >>>> > > >>     INFO: Program took 669 ms (Minutes: 0.01115)
> >>>> > > >>     cluster,distance.mean,distance.sd
> >>>> > > >>
> >>>> > >
> >>>> > >
> >>>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> >>>> > > >>
> >>>> > > >>
> >>>> > > >>
> >>>> > > >>
> >>>> > > >>
> >>>> > > >>
> >>>> > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
> >>>> > > >> > From: suneel_marthi@yahoo.com
> >>>> > > >> > Subject: MAHOUT 0.9 Release - New URL
> >>>> > > >> > To: user@mahout.apache.org; dev@mahout.apache.org
> >>>> > > >> >
> >>>> > > >> > Third time's a Charm!!!
> >>>> > > >> >
> >>>> > > >> >
> >>>> > > >> > Here's the new URL for Mahout 0.9 Release:
> >>>> > > >> >
> >>>> > > >>
> >>>> > >
> >>>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> >>>> > > >> >
> >>>> > > >> > For those volunteering to test this, some of the things to be
> >>>> > > verified:
> >>>> > > >> >
> >>>> > > >> > a) Verify that u can unpack the release (tar or zip)
> >>>> > > >> > b) Verify u r able to compile the distro
> >>>> > > >> > c)  Run through the unit tests: mvn clean test
> >>>> > > >> > d) Run the example scripts
> >>>> > > >>  under $MAHOUT_HOME/examples/bin. Please run through all the
> >>>> different
> >>>> > > >> options in each script.
> >>>> > > >> >
> >>>> > > >> >
> >>>> > > >> > Committers
> >>>> > > >> >  and PMC members:
> >>>> > > >> > ---------------------------------------
> >>>> > > >> >
> >>>> > > >> > Need 'at least 3 +1 votes' for the Release to pass.
> >>>> > > >> >
> >>>> > > >> >
> >>>> > > >> > Thanks and
> >>>> > >  Regards.
> >>>> > > >>
> >>>> > > >
> >>>> > > >
> >>>> > >
> >>>>
> >>>
> >>>
> >>
> >
 		 	   		  

Re: MAHOUT 0.9 Release - New URL

Posted by Suneel Marthi <su...@yahoo.com>.
Andrew M., Andrew P. and others,

Sebastian and me fixed a few issues today (for 0.9):

a) Removed asf-email-examples.sh script and few other scripts that should have been removed. Also removed references/invocations to algorithms that have been removed from the codebase.
b) Fixed the issue with Streaming Kmeans clustering and checked in the code.  
c) Resurrected Frequent Pattern Mining implementation for 0.9.

Please checkout the latest code from trunk, run a build locally and run thru the example scripts. 

Thanks and Regards.






On Wednesday, January 22, 2014 12:11 AM, Andrew Musselman <an...@gmail.com> wrote:
 
*factorize-movielens-1M.sh:*
RMSE is:

0.8519064098265133


Sample recommendations:

2229
[2197:4.4961276,527:4.4309845,2972:4.4180074,318:4.379484,572:4.312696,3092:4.249903]
5848
[1900:4.6775646,3787:4.6623707,632:4.641377,2609:4.608225,3808:4.6058936,2998:4.6057487]
3728
[572:4.951382,47:4.748921,874:4.6945343,1563:4.679901,3314:4.6621537,50:4.655838]
1252
[53:5.0,3816:4.9664702,3077:4.9494777,213:4.94007,3808:4.9060082,978:4.8568053]
634
[572:5.0,3092:4.779557,1872:4.72024,2687:4.629712,2125:4.615142,3853:4.5261393]
5516    [572:5.0,2197:5.0,3092:5.0,318:4.908213,356:4.885,3844:4.8237453]
2276    [1204:5.0,572:5.0,912:5.0,1250:5.0,1272:4.999891,1262:4.989652]
4219
[53:4.8112006,598:4.775032,858:4.761604,572:4.7579737,1219:4.680987,1221:4.6604886]
91      [1198:5.0,2762:5.0,1207:5.0,1234:5.0,318:5.0,260:5.0]
502
[953:5.0,260:4.9800477,1234:4.869403,1198:4.8527064,1207:4.8497486,3469:4.847286]

factorize-netflix.sh:
References a no-longer-available data set that Netflix took down after the
competition; should at least mention that the data set is no longer
"online" at least.


On Tue, Jan 21, 2014 at 8:05 PM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:

> *clustering-syntheticcontrol.sh*
>
> *Canopy:*
> [snip]
>         1.0 : [distance-squared=1740.681000315628]: [35.486, 25.600,
> 29.914, 30.200, 27.654, 28.647, 26.582, 32.941, 34.728, 29.047, 34.706,
> 34.764, 28.816, 30.271, 25.784, 25.035, 35.436, 29.075, 24.267, 24.625,
> 28.162, 26.218, 28.773, 28.966, 28.802, 34.292, 26.211, 33.363, 32.920,
> 31.891, 34.504, 32.686, 24.327, 35.981, 31.390, 10.832, 20.238, 10.051,
> 14.877, 10.570, 19.603, 14.544, 10.667, 16.470, 19.007, 10.352, 13.473,
> 12.196, 10.684, 16.620, 20.434, 17.069, 18.744, 9.599, 11.195, 12.002,
> 10.017, 17.149, 14.850, 10.890]
>         1.0 : [distance-squared=1455.363773097357]: [31.022, 28.140,
> 26.730, 26.570, 29.561, 26.966, 28.049, 25.673, 33.721, 26.275, 30.410,
> 31.101, 24.019, 35.659, 25.253, 25.932, 28.618, 32.423, 33.666, 33.745,
> 35.118, 29.164, 25.477, 31.947, 35.491, 30.730, 25.820, 24.651, 25.528,
> 31.343, 29.005, 31.825, 26.891, 28.194, 31.429, 16.935, 8.070, 16.604,
> 14.743, 10.342, 8.155, 10.395, 17.689, 16.791, 14.138, 15.761, 6.787,
> 13.062, 16.660, 15.021, 9.891, 9.216, 11.550, 8.877, 18.220, 9.477, 10.342,
> 16.430, 11.898, 15.366]
>         1.0 : [distance-squared=1679.9304895378882]: [29.625, 25.503,
> 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> 15.285, 22.528, 20.657, 24.129]
>         1.0 : [distance-squared=2044.2887801683828]: [27.414, 25.397,
> 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> 20.229, 11.131, 9.980, 10.720]
>         1.0 : [distance-squared=1385.3154063160764]: [35.899, 26.672,
> 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> 16.546, 15.927, 18.084, 17.475]
>         1.0 : [distance-squared=1920.6376615603585]: [24.538, 24.280,
> 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> 19.310, 12.999, 17.460]
>         1.0 : [distance-squared=2192.939571172661]: [34.335, 30.938,
> 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> 11.743, 11.699, 10.152]
> Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Wrote 6 clusters
> Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 5827 ms (Minutes: 0.09711666666666667)
>
> *K-means:*
> [snip]
>         1.0 : [distance-squared=2873.881301031739]: [26.369, 37.791,
> 41.839, 39.694, 36.728, 35.079, 30.668, 24.755, 20.610, 18.885, 15.459,
> 14.074, 19.117, 34.230, 32.202, 40.715, 39.543, 37.193, 39.448, 30.829,
> 25.273, 22.324, 19.801, 13.505, 17.462, 24.287, 27.398, 38.577, 42.108,
> 41.947, 43.987, 41.331, 28.958, 23.664, 20.308, 22.244, 12.149, 15.768,
> 18.964, 32.579, 33.202, 36.205, 42.364, 40.601, 35.415, 39.576, 33.145,
> 19.936, 23.062, 19.053, 24.383, 19.611, 25.218, 38.860, 36.570, 38.964,
> 36.946, 36.900, 32.593, 31.563]
>         1.0 : [distance-squared=2525.8924241648783]: [35.389, 31.178,
> 40.041, 43.034, 49.524, 40.942, 42.369, 30.153, 26.253, 23.178, 19.885,
> 19.693, 21.837, 26.858, 23.533, 29.798, 43.401, 43.349, 48.238, 43.868,
> 44.017, 28.056, 25.284, 22.737, 13.703, 14.023, 22.115, 23.720, 25.478,
> 37.870, 37.868, 46.954, 43.135, 41.286, 37.269, 29.347, 24.312, 21.743,
> 19.152, 21.668, 10.882, 16.147, 30.020, 28.472, 38.716, 44.620, 47.847,
> 48.846, 39.361, 38.449, 33.105, 19.935, 14.961, 12.149, 12.630, 13.459,
> 18.967, 35.473, 30.146, 45.527]
>         1.0 : [distance-squared=2392.7171990886272]: [27.662, 37.199,
> 39.158, 44.264, 46.473, 40.178, 38.728, 24.412, 25.155, 15.938, 13.125,
> 16.867, 13.875, 29.969, 34.322, 40.870, 44.225, 47.594, 47.607, 44.751,
> 39.813, 32.461, 16.646, 16.305, 21.256, 20.627, 19.267, 31.901, 34.995,
> 43.643, 43.152, 47.125, 48.524, 33.131, 32.697, 20.300, 14.350, 18.330,
> 12.155, 11.261, 19.701, 29.313, 30.457, 39.914, 49.543, 49.851, 46.445,
> 39.561, 28.860, 22.486, 13.967, 10.006, 11.715, 15.945, 21.348, 25.439,
> 41.024, 37.105, 45.623, 43.589]
>         1.0 : [distance-squared=1419.8378244373016]: [25.784, 34.129,
> 42.659, 37.176, 35.961, 34.307, 32.108, 29.749, 25.047, 17.455, 24.652,
> 25.311, 22.995, 30.256, 25.955, 28.426, 34.556, 40.386, 39.642, 40.566,
> 32.612, 34.091, 26.033, 25.668, 26.545, 17.338, 24.980, 20.134, 27.542,
> 36.612, 31.855, 37.947, 39.736, 33.535, 36.607, 37.479, 32.612, 22.262,
> 20.662, 16.124, 24.547, 27.686, 21.747, 27.198, 31.259, 40.569, 37.067,
> 34.465, 34.730, 33.371, 23.060, 30.162, 22.022, 22.216, 14.812, 19.357,
> 24.508, 34.432, 32.155, 34.839]
>         1.0 : [distance-squared=4186.814512311335]: [25.870, 39.195,
> 36.908, 47.052, 47.384, 40.741, 42.494, 30.282, 25.834, 17.650, 16.004,
> 17.895, 13.321, 19.045, 27.440, 31.911, 39.208, 43.622, 41.567, 44.815,
> 44.921, 35.422, 35.477, 23.190, 17.859, 14.684, 23.504, 23.141, 21.746,
> 30.816, 31.361, 37.015, 38.094, 46.688, 47.681, 43.777, 39.652, 31.701,
> 23.767, 22.265, 22.654, 22.327, 19.195, 21.163, 29.602, 27.563, 36.244,
> 38.859, 44.234, 42.352, 42.160, 40.172, 30.094, 21.092, 25.193, 13.096,
> 18.111, 14.754, 27.386, 27.026]
>         1.0 : [distance-squared=1544.4011543572997]: [28.075, 41.784,
> 42.120, 38.735, 44.320, 34.316, 32.212, 31.868, 24.301, 14.547, 17.178,
> 22.279, 24.357, 31.011, 31.444, 34.837, 46.550, 48.301, 38.859, 42.363,
> 35.657, 31.499, 19.794, 12.124, 15.371, 12.436, 15.763, 24.679, 32.597,
> 43.004, 36.616, 38.935, 42.954, 34.957, 36.183, 28.177, 16.326, 14.988,
> 10.680, 22.728, 24.075, 24.058, 36.616, 43.982, 39.198, 40.118, 40.078,
> 34.752, 34.018, 23.750, 18.374, 12.251, 15.539, 18.699, 28.973, 24.044,
> 39.404, 38.034, 46.458, 44.432]
>         1.0 : [distance-squared=825.9338725427806]: [33.670, 38.675,
> 39.742, 41.989, 37.291, 43.975, 31.909, 25.878, 31.080, 15.858, 13.950,
> 23.097, 19.983, 21.692, 31.579, 38.570, 33.376, 38.843, 41.936, 33.534,
> 39.195, 32.897, 25.343, 18.523, 15.089, 17.771, 22.614, 25.313, 23.687,
> 29.010, 41.995, 35.712, 40.872, 41.669, 32.156, 25.162, 24.980, 23.705,
> 18.413, 20.975, 14.906, 26.171, 30.165, 27.818, 35.083, 39.514, 37.851,
> 33.967, 32.338, 34.977, 26.589, 28.079, 19.597, 24.669, 23.098, 25.685,
> 28.215, 34.940, 36.910, 39.749]
> Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Wrote 6 clusters
> Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 16902 ms (Minutes: 0.2817)
>
> *Fuzzy k-means:*
> [snip]
>         1.0 : [distance-squared=971.7369782121968]: [29.625, 25.503,
> 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> 15.285, 22.528, 20.657, 24.129]
>         1.0 : [distance-squared=2054.618163154475]: [27.414, 25.397,
> 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> 20.229, 11.131, 9.980, 10.720]
>         1.0 : [distance-squared=954.6503560728597]: [35.899, 26.672,
> 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> 16.546, 15.927, 18.084, 17.475]
>         1.0 : [distance-squared=2817.9170498632957]: [24.538, 24.280,
> 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> 19.310, 12.999, 17.460]
>         1.0 : [distance-squared=3472.3684696871424]: [34.335, 30.938,
> 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> 11.743, 11.699, 10.152]
> Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Wrote 6 clusters
> Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 106615 ms (Minutes: 1.7769166666666667)
>
> *Dirichlet and Meanshift:*
> Already detailed in M-1400, deprecated jobs still referenced.
>
>
>
> On Tue, Jan 21, 2014 at 6:20 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
>> *cluster-reuters.sh*
>> *k-means:*
>>
>> [snip]
>> :VL-19482{n=913 c=[0.06:0.011, 0.1:0.007, 0.13:0.010, 0.25:0.016,
>> 0.38:0.020, 0.4:0.007, 0.5:0.032, 0
>>         Top Terms:
>>                 banks                                   =>
>> 3.841823268955143
>>                 bank                                    =>
>>  3.80633066361209
>>                 debt                                    =>
>>  3.28065219870794
>>                 said                                    =>
>>  2.5965700942088583
>>                 he                                      =>
>> 2.335682813857497
>>                 foreign                                 =>
>>  2.2217853688201403
>>                 billion                                 =>
>>  2.1970193848291335
>>                 would                                   =>
>>  1.9932392063955617
>>                 loans                                   =>
>>  1.9309276792854233
>>                 interest                                =>
>>  1.787324501938
>>                 have                                    =>
>> 1.762981951432578
>>                 its                                     =>
>>  1.7615109954971866
>>                 which                                   =>
>>  1.5822081148036862
>>                 has                                     =>
>>  1.5600708189041956
>>                 dlrs                                    =>
>>  1.5571038313005996
>>                 finance                                 =>
>>  1.5539758811252924
>>                 new                                     =>
>>  1.5176015811577555
>>                 had                                     =>
>>  1.5138723701401844
>>                 brazil                                  =>
>>  1.5083369853593172
>>                 payments                                =>
>>  1.4539044255886517
>>         Weight : [props - optional]:  Point:
>>
>> :VL-7320{n=2726 c=[0:0.003, 0.1:0.010, 0.2:0.007, 0.3:0.009, 0.4:0.007,
>> 0.40:0.003, 0.5:0.009, 0.57:0
>>         Top Terms:
>>                 vs                                      =>
>> 6.126130791333171
>>                 net                                     =>
>> 4.012191567277523
>>                 cts                                     =>
>> 3.822006848832744
>>                 shr                                     =>
>>  3.6786004856764527
>>                 mln                                     =>
>>  2.9011643584038698
>>                 loss                                    =>
>> 2.788368861463607
>>                 qtr                                     =>
>> 2.714140225051522
>>                 revs                                    =>
>>  2.4739861236454717
>>                 profit                                  =>
>>  1.8146888090247015
>>                 note                                    =>
>>  1.7977163272138388
>>                 dlrs                                    =>
>>  1.6164390808155846
>>                 avg                                     =>
>>  1.3901765773336587
>>                 shrs                                    =>
>>  1.3856326531419314
>>                 mths                                    =>
>>  1.3168717272038506
>>                 4th                                     =>
>>  1.2161158425617289
>>                 oper                                    =>
>> 1.182419473776814
>>                 year                                    =>
>> 1.178086061733047
>>                 nine                                    =>
>>  1.0670554836445316
>>                 3rd                                     =>
>> 1.041334410056592
>>                 inc                                     =>
>>  1.0019361981554935
>>         Weight : [props - optional]:  Point:
>>
>>
>> Inter-Cluster Density: 0.45562152681859414
>> Intra-Cluster Density: 0.6952712632167628
>> CDbw Inter-Cluster Density: 0.0
>> CDbw Intra-Cluster Density: 16.486930227598684
>> CDbw Separation: 194.49005884464628
>>
>> *fuzzy k-means:*
>> :SV-18539{n=1039 c=[0:0.026, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
>> 0.01:0.005, 0.02:0.002, 0.0
>>         Top Terms:
>>                 said                                    =>
>>  1.8665592354713065
>>                 its                                     =>
>>  1.1335212213411592
>>                 pct                                     =>
>>  1.0862816801353348
>>                 dlrs                                    =>
>>  1.0854998884993752
>>                 mln                                     =>
>> 1.043163996400643
>>                 from                                    =>
>>  0.9684961110525736
>>                 has                                     =>
>> 0.912161511978058
>>                 company                                 =>
>>  0.8754186972808333
>>                 mar                                     =>
>>  0.8675333452422878
>>                 inc                                     =>
>>  0.7678617590362815
>>                 would                                   =>
>>  0.7610968883652675
>>                 he                                      =>
>>  0.7459988770503974
>>                 which                                   =>
>>  0.7435613119406804
>>                 year                                    =>
>>  0.7302840632748394
>>                 u.s                                     =>
>>  0.7281061062439116
>>                 shares                                  =>
>>  0.7260764102983083
>>                 corp                                    =>
>>  0.7179807367808658
>>                 new                                     =>
>>  0.7044203783157115
>>                 stock                                   =>
>>  0.6962010978721442
>>                 have                                    =>
>>  0.6464265467298506
>> :SV-9431{n=1034 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
>> 0.01:0.004, 0.02:0.002, 0.02
>>         Top Terms:
>>                 said                                    =>
>> 1.864911184196927
>>                 dlrs                                    =>
>> 1.199286689822081
>>                 mln                                     =>
>>  1.1802134783562215
>>                 pct                                     =>
>>  1.1529704214798124
>>                 its                                     =>
>>  1.1184398851519701
>>                 from                                    =>
>> 1.016647848050332
>>                 company                                 =>
>> 0.894703604722841
>>                 mar                                     =>
>> 0.879986159541356
>>                 has                                     =>
>>  0.8642799128491316
>>                 year                                    =>
>>  0.8271823503717782
>>                 inc                                     =>
>>  0.7871293745341424
>>                 corp                                    =>
>> 0.737705498468879
>>                 which                                   =>
>> 0.722975201852743
>>                 would                                   =>
>> 0.708000816484415
>>                 u.s                                     =>
>>  0.7073294276173905
>>                 billion                                 =>
>>  0.7055723996916351
>>                 he                                      =>
>>  0.7042684217823294
>>                 new                                     =>
>>  0.6834737905434939
>>                 shares                                  =>
>>  0.6753327384172428
>>                 stock                                   =>
>>  0.6576225144041699
>> :SV-4785{n=1044 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
>> 0.01:0.006, 0.02:0.002, 0.02
>>         Top Terms:
>>                 said                                    =>
>>  1.8796076179735086
>>                 its                                     =>
>> 1.172025965452378
>>                 dlrs                                    =>
>> 1.130422792460914
>>                 pct                                     =>
>> 1.082038255241358
>>                 mln                                     =>
>>  1.0772146872767114
>>                 company                                 =>
>>  0.9662235879639138
>>                 from                                    =>
>>  0.9473172871605616
>>                 has                                     =>
>>  0.9224712965830099
>>                 mar                                     =>
>>  0.8769325856924421
>>                 inc                                     =>
>>  0.8360245257169788
>>                 shares                                  =>
>>  0.8334595641384324
>>                 stock                                   =>
>>  0.7704621839612175
>>                 corp                                    =>
>>  0.7682400250301806
>>                 which                                   =>
>>  0.7389988207856137
>>                 would                                   =>
>>  0.7339708917389389
>>                 year                                    =>
>>  0.7088414843731325
>>                 new                                     =>
>>  0.7038109468655172
>>                 he                                      =>
>>  0.6993994455501005
>>                 u.s                                     =>
>>  0.6772649147622415
>>                 share                                   =>
>>  0.6241804830055171
>>
>> *lda:*
>>
>> [snip]
>> 21539
>> {0.02:0.7071698119320297,0.01:0.09185661419250732,0.055:0.05559716236030052,0:0.04416624482186779,0.046:0.04110903741109505,0.10:0.036664417789754995,0.073:0.009543544335363714,0.006913:0.0050293341354450535,0.007050:0.004031353935388081,0.003:0.0019163134919350053}
>> 21540
>> {0.04:0.4289840457590362,0.006913:0.3764552842292379,0.03:0.14201700033942147,0.025:0.031492533562460345,0.01:0.0057428959027208,0.0625:0.005563615996288134,0.007050:0.004410121345864598,0.02:0.003315679992182833,0.06:0.0010680133665222197,0.057:2.6908116668663575E-4}
>> 21541
>> {0:0.6323790020346274,0.02:0.282038438551802,0.06:0.05929917592183956,0.046:0.021922159161082488,0.025:0.003123559949176154,0.1:6.163861804777446E-4,0.057:1.1513607281652563E-4,0.077:7.546641269535658E-5,0.05:6.020069105221075E-5,0.04:5.727546417602751E-5}
>> 21542
>> {0.06:0.7916202902993545,0.003:0.12474538653424426,0.03:0.0516187501990535,0.073:0.013078206873952192,0.077:0.006874558859394474,0.1:0.004726257215175674,0.07:0.0029954110394770084,0.02:0.0015535221634909381,0.0625:8.057802584933225E-4,0.006913:5.349608183182145E-4}
>> 21543
>> {0.03:0.8557703918728101,0.007050:0.0884696514495358,0.01:0.02386217742025169,0.055:0.01794750983209314,0.046:0.007244240662980594,0.02:0.0025021610305199344,0.1:0.0011795344808501283,0.0625:9.555718731181631E-4,0.077:8.384080940465597E-4,0.003:2.4480111447900804E-4}
>> 21544
>> {0.006913:0.6497351267772595,0.03:0.2586364130498913,0.003:0.0845308091084703,0.05:0.003488921831506818,0.0625:0.0011991059579690732,0.06:5.301659535652308E-4,0.055:3.152013670552699E-4,0.025:3.11949418681945E-4,0.02:2.2346509541410544E-4,0.057:1.727741085612565E-4}
>> 21545
>> {0.07:0.24937001680778675,0.03:0.22854680579732564,0.1:0.22068070811382484,0.05:0.21891358916881656,0.06:0.04971205462818302,0.046:0.029384231252419384,0.025:0.0014118858333346275,0.0625:0.0012321257282372393,0.073:3.1606402052550454E-4,0.04:1.1662985389192366E-4}
>> 21546
>> {0.006913:0.652435612545713,0.073:0.24282539253890825,0.003:0.055020663101050894,0.046:0.01978650831133704,0.04:0.011699600233998459,0.02:0.006822518565048025,0.05:0.0042733514779345234,0.025:0.0020768247329981497,0.03:0.0015466188524926666,0:0.0010433788491090747}
>> 21547
>> {0.1:0.935072008706917,0.077:0.040616382672055494,0.046:0.023138054104883225,0.06:3.42252302356326E-4,0.03:2.672247473681925E-4,0.007050:9.004419280457053E-5,0.006913:7.844271097106069E-5,0.04:7.198835322717653E-5,0.073:5.8017696474244885E-5,0.02:4.474316852571109E-5}
>> 21548
>> {0.006913:0.4510141547325999,0.03:0.21472666934984877,0.07:0.15060570743672352,0.046:0.08216181853028293,0.05:0.07498297963542139,0.077:0.01401215532342401,0.04:0.006135722806477439,0.073:0.0031124469556872442,0.02:0.001327252389650958,0.025:4.5167404646311124E-4}
>> 21549
>> {0.077:0.5249260290096315,0:0.27520186965742544,0.073:0.08959570186504386,0.057:0.05063809804337512,0.02:0.045313417578343,0.03:0.00856024297303885,0.06:0.0034039136814002697,0.07:8.611323331226122E-4,0.05:3.9387255234958607E-4,0.04:3.454752498776842E-4}
>> 21550
>> {0.077:0.8319708524327014,0.007050:0.16001430652201168,0.0625:0.0024884528530674226,0.03:0.0024228964094551416,0.006913:0.0010048446080994332,0.025:5.974607168723063E-4,0.06:2.7741593377792194E-4,0.057:2.2975316663437597E-4,0.073:2.1769059479546598E-4,0.01:1.4945234676270913E-4}
>> 21551
>> {0:0.43298549949426596,0.07:0.30407459462158964,0.0625:0.15744077685124136,0.01:0.06385977925647884,0.025:0.020205557109867888,0.04:0.013226123091680062,0.055:0.006501562964287696,0.073:5.871897881404074E-4,0.003:4.554836644848264E-4,0.05:2.0489216962221778E-4}
>> 21552
>> {0.06:0.7383744333942458,0.02:0.07321126682102753,0.07:0.05910378841288311,0.10:0.056907223730939045,0:0.02739559786902668,0.055:0.02228913751272657,0.1:0.00943274247398869,0.073:0.007301445750018608,0.03:0.0027711985062277246,0.046:0.0022569760697531112}
>> 21553
>> {0.006913:0.9383779606743132,0.07:0.018356074893823605,0.1:0.017572502072403694,0.025:0.011031848455352145,0.02:0.007110054905474798,0.10:0.0032390933761145377,0.01:0.0022676909091530165,0.06:6.35712654913703E-4,0.003:3.037608224814005E-4,0.0625:2.569989005512836E-4}
>> 21554
>> {0.02:0.4740260172915081,0.03:0.3264871353578654,0.003:0.12383071192450323,0:0.0387913487693229,0.046:0.013922363892934853,0.01:0.01115832101415319,0.055:0.004438529958216764,0.006913:0.0031112721201723413,0.025:0.002399946628810436,0.073:6.131643125523258E-4}
>> 21555
>> {0.01:0.42396888624053325,0.057:0.2876207419392007,0.055:0.13436747721404915,0.10:0.0923627978966975,0.05:0.030530372962547347,0.003:0.008404234144369286,0.006913:0.007605013189108045,0.007050:0.005108825532219537,0.0625:0.004651573366090424,0.025:0.002180181204608079}
>> 21556
>> {0.02:0.9634685201058687,0.077:0.02816791882994464,0.003:0.007368001305747207,0.01:4.6299664334060455E-4,0.055:6.358811991095171E-5,0.06:5.5240076581247115E-5,0.03:5.231350505975146E-5,0.073:4.4773553407989E-5,0:3.6810457882911134E-5,0.006913:3.469265861451538E-5}
>> 21557
>> {0.06:0.9993947051352264,0.05:2.53296407549323E-4,0.03:9.647254237540585E-5,0.04:4.3439270433017595E-5,0.006913:3.354681491698619E-5,0.046:2.8586727566416525E-5,0.007050:2.5584606074089293E-5,0.02:2.2477243875316502E-5,0:1.9194783598477564E-5,0.073:1.6900267481892075E-5}
>> 21558
>> {0.06:0.9985073788555696,0.03:8.033091100049725E-4,0.02:1.1953766739610202E-4,0.1:9.257610652171745E-5,0.006913:6.903783269939522E-5,0.04:5.7960967280609926E-5,0.07:5.5767050956214925E-5,0.046:4.857027337508731E-5,0.007050:3.5964741541364354E-5,0.057:3.538185773175377E-5}
>> 21559
>> {0.006913:0.5411158302162348,0.073:0.10595054605908563,0.04:0.10013413310674449,0.007050:0.08891690362990352,0.003:0.08028744789933502,0.03:0.07779866329563544,0.025:0.0024314950846438975,0.0625:0.0018111845757907532,0.077:5.019763638644379E-4,0.055:2.6705221435486376E-4}
>> 21560
>> {0.06:0.9978884964462115,0.03:9.139179655096763E-4,0.02:6.066525825847198E-4,0.04:3.2209695809936266E-4,0.006913:7.970816960964983E-5,0.007050:5.58984012266038E-5,0.046:1.545206468939752E-5,0.077:1.5319891034946036E-5,0:1.5223193391279898E-5,0.073:1.1687759018690983E-5}
>> 21561
>> {0.06:0.8704787441462888,0.007050:0.09131090501970876,0.0625:0.03778566273654969,0.046:8.111987924033587E-5,0.01:6.397000919080148E-5,0.1:4.979138875277178E-5,0.073:2.7942853854174412E-5,0.03:2.762357950066146E-5,0.077:2.6820742114533983E-5,0.025:2.6208487691114472E-5}
>> 21562
>> {0.06:0.9625603310705717,0.02:0.01708933716171641,0.007050:0.011374975845817934,0.07:0.00482114999912725,0.057:0.003310531318631415,0.077:2.568457462720504E-4,0.025:1.3439876933758153E-4,0.1:1.0512433283405881E-4,0:8.737443941062146E-5,0.046:5.9284232059258864E-5}
>> 21563
>> {0.06:0.9996809273982157,0.046:1.6325482120709976E-4,0.02:2.1084598024339765E-5,0.006913:1.7901242019979392E-5,0.04:1.3991162886383784E-5,0.03:1.3682157013015017E-5,0.077:1.3602492456590212E-5,0.007050:1.206712606141859E-5,0.1:1.0231842284246997E-5,0.057:7.704725064510759E-6}
>> 21564
>> {0.06:0.9997861963976675,0.03:2.7140899197995276E-5,0.077:2.4456158311423386E-5,0.04:1.7952818233846462E-5,0.02:1.6334455815684533E-5,0.006913:1.4910843270220926E-5,0.073:1.4893553937733922E-5,0.1:1.2545291899719683E-5,0.007050:1.2337617858874285E-5,0.07:1.121302251254011E-5}
>> 21565
>> {0.006913:0.5047832315246878,0.007050:0.2502505818382197,0.04:0.09937533960784072,0.03:0.05332716291468396,0.0625:0.035738976624857435,0.05:0.023139962103851885,0.1:0.01510786357969295,0.025:0.01294816540331917,0.06:0.003589347327961106,0.073:4.8607269023994543E-4}
>> 21566
>> {0.03:0.865579490292393,0.073:0.065013560785593,0.077:0.056622094108767465,0.046:0.006811842330071251,0.057:0.0021561477140846267,0.01:0.0013149375957061502,0.04:7.456782721333958E-4,0.05:6.412995789267404E-4,0:2.107021879325011E-4,0.007050:1.8121393989190674E-4}
>> 21567
>> {0.077:0.6528663315309344,0.03:0.2794089480653573,0.025:0.060214953606503134,0.003:0.0029851203917978303,0.01:0.002840784719750811,0.007050:5.508511345707982E-4,0.02:3.3395642786457786E-4,0.07:1.7452781529689483E-4,0.055:1.541344869853217E-4,0.046:1.0154945247629696E-4}
>> 21568
>> {0.057:0.7925855379581803,0.03:0.2036907633660934,0.06:0.002283924010657722,0.046:0.0011125177332923534,0.10:6.706442563331911E-5,0.02:3.9070227131596934E-5,0.07:3.266094677087569E-5,0.1:2.3436639268605713E-5,0.077:2.318067689954084E-5,0.006913:2.273173553155518E-5}
>> 21569
>> {0.06:0.7223525910216753,0.1:0.22280373045161775,0.04:0.04239924319412595,0.02:0.006529556700876843,0.007050:0.004661124794787862,0.10:2.017974034648702E-4,0.05:2.017516118028694E-4,0.025:1.583677755896652E-4,0.006913:1.1939663934259253E-4,0:8.736457986006156E-5}
>> 21570
>> {0.073:0.5321098304788365,0.006913:0.3897385574581158,0.02:0.051125703897749404,0.077:0.012417023254098358,0.01:0.006491518762040415,0.03:0.005434610908750246,0.055:9.352347087701305E-4,0.06:5.239363525283659E-4,0.10:4.4114444135088393E-4,0.003:2.2360302221231853E-4}
>> 21571
>> {0.06:0.9074255414695478,0.05:0.08971808812931319,0.02:0.0019124104766371694,0.1:2.581234320438502E-4,0.073:1.9041498001195312E-4,0.046:8.348355382104383E-5,0.006913:5.748062878632201E-5,0.057:4.8979196235823963E-5,0.04:4.4005272512340306E-5,0.03:4.1089625609562384E-5}
>> 21572
>> {0.06:0.7216888928389846,0.04:0.1955684645266304,0.006913:0.08235454447065854,0.1:9.82899034505965E-5,0.046:4.9184577303445956E-5,0.05:4.021965070167039E-5,0.007050:3.081280652634891E-5,0.073:2.791145099471127E-5,0.02:1.95681856192452E-5,0:1.9037190007395713E-5}
>> 21573
>> {0.05:0.8568971411565196,0.046:0.12909436895238377,0.06:0.012704234652048044,0.04:2.472784666357729E-4,0.055:2.1886698996361582E-4,0.1:1.1974451444757112E-4,0.0625:9.082357988309755E-5,0.07:9.030017229129562E-5,0.03:7.268045692763623E-5,0.073:6.607926928741721E-5}
>> 21574
>> {0.046:0.5619466458628039,0.006913:0.184782367089353,0.0625:0.09726566772972363,0.003:0.09534816862353344,0.02:0.015692927163565275,0.073:0.015492017672231727,0.01:0.01411218625979968,0.007050:0.01037341031640615,0.055:0.0020124341216292752,0:0.0013828685922332715}
>> 21575
>> {0.05:0.3167681189235041,0.06:0.2879333280436204,0.046:0.22584628506521745,0.003:0.15994241628395953,0.0625:0.006482994028630967,0.1:0.0016773104050919493,0.055:4.8725015996676173E-4,0.03:2.432893903382962E-4,0.025:1.6015914759364425E-4,0.006913:1.0021292427951807E-4}
>> 21576
>> {0.077:0.2407816981967022,0.003:0.18594668103110193,0.10:0.14951775492012523,0.0625:0.09241152906714677,0.007050:0.08997645163280943,0.057:0.07102865286733068,0.055:0.048046579920457584,0.05:0.03776387140040494,0.073:0.033355081179026046,0.006913:0.015980584385115525}
>> 21577
>> {0.06:0.9599074612361259,0.1:0.03694135499501186,0.04:0.0021941315048273186,0.057:4.7070027358666304E-4,0.02:1.8231606308803002E-4,0.0625:6.234942557920162E-5,0.006913:5.159011604129561E-5,0.046:3.9832812943124216E-5,0.007050:3.0380683715134534E-5,0.05:2.8920851352755496E-5}
>>
>>
>> *Streaming k-means:*
>>
>> [snip]
>> INFO: Number of Centroids: 0
>> Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
>> WARNING: job_local23982482_0001
>> java.lang.IllegalArgumentException: Must have nonzero number of training
>> and test vectors. Asked for %.1f %% of %d vectors for test
>> [10.000000149011612, 0]
>>         at
>> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
>>         at
>> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
>>         at
>> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
>>         at
>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
>>         at
>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
>>         at
>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
>>         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
>>         at
>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>>         at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
>>
>> [snip]
>>
>> WARNING: No qualcluster.props found on classpath, will use command-line
>> arguments only
>> Num clusters: 0; maxDistance: 0.000000
>> [Dunn Index] First: Infinity
>> [Davies-Bouldin Index] First: NaN
>> Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Program took 535 ms (Minutes: 0.008916666666666666)
>> cluster,distance.mean,distance.sd
>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>>
>>
>> On Tue, Jan 21, 2014 at 1:47 PM, Andrew Musselman <
>> andrew.musselman@gmail.com> wrote:
>>
>>> *classify-20newsgroups.sh*
>>>
>>> *Complementary naive bayes:*
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances          :      11207       98.9406%
>>> Incorrectly Classified Instances        :        120        1.0594%
>>> Total Classified Instances              :      11327
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a       b       c       d       e       f       g       h       i
>>> j       k       l       m       n       o       p       q       r       s
>>>    t        <--Classified as
>>> 475     0       0       1       0       0       0       0       0
>>> 0       0       0       0       0       1       0       1       0       0
>>>    0         |  478         a     = alt.atheism
>>> 0       597     1       1       0       1       1       0       0
>>> 0       0       1       0       2       1       0       0       0       0
>>>    0         |  605         b     = comp.graphics
>>> 0       1       620     3       0       1       0       0       0
>>> 0       0       1       0       0       1       0       0       0       0
>>>    0         |  627         c     = comp.os.ms-windows.misc
>>> 1       1       1       593     2       0       0       0       0
>>> 0       0       0       0       0       0       1       0       0       0
>>>    0         |  599         d     = comp.sys.ibm.pc.hardware
>>> 0       1       1       0       568     0       1       0       0
>>> 0       1       1       2       0       0       0       0       1       0
>>>    0         |  576         e     = comp.sys.mac.hardware
>>> 0       4       2       0       0       581     0       0       0
>>> 0       0       0       0       0       0       0       0       0       0
>>>    0         |  587         f     = comp.windows.x
>>> 0       0       0       1       2       0       571     3       0
>>> 0       1       1       4       1       0       0       0       0       0
>>>    0         |  584         g     = misc.forsale
>>> 0       0       0       1       0       0       0       589     1
>>> 0       0       1       1       0       0       0       0       0       0
>>>    0         |  593         h     = rec.autos
>>> 0       0       0       0       0       0       0       1       565
>>> 0       0       0       0       0       1       0       0       0       0
>>>    0         |  567         i     = rec.motorcycles
>>> 0       0       0       0       0       0       0       0       0
>>> 600     2       0       0       0       1       0       0       0       0
>>>    0         |  603         j     = rec.sport.baseball
>>> 0       0       0       0       0       0       0       0       0
>>> 1       584     0       0       0       0       0       0       0       0
>>>    0         |  585         k     = rec.sport.hockey
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       579     0       0       0       0       0       1       0
>>>    0         |  580         l     = sci.crypt
>>> 0       0       0       1       3       0       2       0       0
>>> 2       0       0       567     1       2       1       0       0       0
>>>    0         |  579         m     = sci.electronics
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       0       1       605     0       0       0       0       0
>>>    0         |  606         n     = sci.med
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       0       0       0       602     0       0       0       0
>>>    0         |  602         o     = sci.space
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       0       0       1       0       602     0       0       1
>>>    0         |  604         p     = soc.religion.christian
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       0       0       0       0       0       556     0       0
>>>    0         |  556         q     = talk.politics.mideast
>>> 0       0       1       0       0       0       0       0       0
>>> 0       0       1       0       0       1       0       0       568     0
>>>    0         |  571         r     = talk.politics.guns
>>> 11      0       0       0       0       0       0       0       0
>>> 1       0       0       0       1       3       8       1       4       338
>>>    2         |  369         s     = talk.religion.misc
>>> 0       0       0       0       0       0       0       0       0
>>> 0       1       0       0       0       1       0       3       4       0
>>>    447       |  456         t     = talk.politics.misc
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa                                       0.9806
>>> Accuracy                                   98.9406%
>>> Reliability                                94.0932%
>>> Reliability (standard deviation)            0.2163
>>>
>>> Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 15870 ms (Minutes: 0.2645)
>>> + echo 'Testing on holdout set'
>>> Testing on holdout set
>>> + ./bin/mahout testnb -i /tmp/mahout-work-ec2-user/20news-test-vectors
>>> -m /tmp/mahout-work-ec2-user/model -l /tmp/mahout-work-ec2-user/labelindex
>>> -ow -o /tmp/mahout-work-ec2-user/20news-testing -c
>>>
>>> [snip]
>>>
>>> INFO: Complementary Results:
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances          :       6715       89.3071%
>>> Incorrectly Classified Instances        :        804       10.6929%
>>> Total Classified Instances              :       7519
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a       b       c       d       e       f       g       h       i
>>> j       k       l       m       n       o       p       q       r       s
>>>    t        <--Classified as
>>> 298     0       0       0       0       0       0       0       0
>>> 1       0       0       0       1       2       5       1       0       13
>>>     0         |  321         a     = alt.atheism
>>> 0       298     11      6       1       12      2       2       1
>>> 1       3       8       3       4       2       4       1       4       4
>>>    1         |  368         b     = comp.graphics
>>> 1       17      286     16      4       9       6       3       2
>>> 0       1       0       1       7       1       0       2       1       0
>>>    1         |  358         c     = comp.os.ms-windows.misc
>>> 2       6       11      309     9       5       14      8       1
>>> 0       2       0       6       4       2       0       1       2       1
>>>    0         |  383         d     = comp.sys.ibm.pc.hardware
>>> 0       10      8       7       334     7       5       5       2
>>> 0       3       0       2       1       1       0       1       1       0
>>>    0         |  387         e     = comp.sys.mac.hardware
>>> 1       13      7       8       2       355     2       0       2
>>> 0       0       5       1       1       3       0       0       1       0
>>>    0         |  401         f     = comp.windows.x
>>> 0       7       11      29      12      9       268     16      8
>>> 4       3       2       6       4       2       1       3       1       2
>>>    3         |  391         g     = misc.forsale
>>> 0       1       0       0       3       0       7       362     8
>>> 2       2       1       2       0       2       0       1       2       0
>>>    4         |  397         h     = rec.autos
>>> 0       0       0       1       0       0       1       0       423
>>> 0       0       0       2       1       0       1       0       0       0
>>>    0         |  429         i     = rec.motorcycles
>>> 0       0       1       0       0       0       0       2       2
>>> 371     8       0       2       3       0       2       0       0       0
>>>    0         |  391         j     = rec.sport.baseball
>>> 0       0       1       0       0       0       1       0       0
>>> 2       409     0       0       0       0       0       0       0       0
>>>    1         |  414         k     = rec.sport.hockey
>>> 0       0       1       2       1       0       1       0       0
>>> 0       0       404     0       0       0       0       0       1       0
>>>    1         |  411         l     = sci.crypt
>>> 0       5       4       11      1       3       7       9       2
>>> 5       3       3       339     2       6       0       1       1       2
>>>    1         |  405         m     = sci.electronics
>>> 0       4       0       1       0       0       0       1       0
>>> 1       1       0       3       367     3       1       2       0       0
>>>    0         |  384         n     = sci.med
>>> 0       1       2       0       1       0       2       0       0
>>> 1       0       0       1       1       375     0       1       0       0
>>>    0         |  385         o     = sci.space
>>> 4       2       1       1       0       0       1       1       2
>>> 0       0       1       1       5       1       367     4       0       1
>>>    1         |  393         p     = soc.religion.christian
>>> 0       1       0       0       0       0       0       0       0
>>> 2       0       0       0       0       0       2       378     0       1
>>>    0         |  384         q     = talk.politics.mideast
>>> 0       0       0       0       0       2       1       1       1
>>> 1       0       3       0       3       0       0       2       319     2
>>>    4         |  339         r     = talk.politics.guns
>>> 32      0       0       1       0       0       0       0       0
>>> 1       1       1       0       2       2       26      5       7       175
>>>    6         |  259         s     = talk.religion.misc
>>> 0       0       0       2       0       0       0       0       0
>>> 1       2       2       0       1       2       1       10      18      2
>>>    278       |  319         t     = talk.politics.misc
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa                                       0.8594
>>> Accuracy                                   89.3071%
>>> Reliability                                 84.611%
>>> Reliability (standard deviation)            0.2148
>>>
>>> Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 10840 ms (Minutes: 0.18066666666666667)
>>>
>>>
>>> *Naive bayes:*
>>> INFO: Standard NB Results:
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances          :      11286       99.0869%
>>> Incorrectly Classified Instances        :        104        0.9131%
>>> Total Classified Instances              :      11390
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a       b       c       d       e       f       g       h       i
>>> j       k       l       m       n       o       p       q       r       s
>>>    t        <--Classified as
>>> 474     0       0       0       0       0       0       0       0
>>> 0       0       0       0       0       0       0       0       0       2
>>>    1         |  477         a     = alt.atheism
>>> 0       566     0       2       0       1       0       0       0
>>> 0       0       0       0       0       0       0       0       0       0
>>>    0         |  569         b     = comp.graphics
>>> 0       10      590     29      2       4       1       0       0
>>> 0       0       0       1       0       0       0       0       0       0
>>>    1         |  638         c     = comp.os.ms-windows.misc
>>> 0       0       0       596     0       0       0       0       0
>>> 0       0       0       0       0       0       0       0       0       0
>>>    0         |  596         d     = comp.sys.ibm.pc.hardware
>>> 0       0       0       0       575     0       1       0       0
>>> 0       0       0       1       0       0       0       0       0       0
>>>    0         |  577         e     = comp.sys.mac.hardware
>>> 0       2       2       2       0       593     1       0       0
>>> 0       0       0       0       0       1       0       0       0       0
>>>    0         |  601         f     = comp.windows.x
>>> 0       0       0       1       0       0       589     1       0
>>> 0       1       0       2       0       0       0       0       0       0
>>>    0         |  594         g     = misc.forsale
>>> 0       0       0       0       0       0       0       594     0
>>> 0       0       0       0       0       0       0       0       0       0
>>>    0         |  594         h     = rec.autos
>>> 0       0       0       0       0       0       0       0       611
>>> 0       0       0       0       0       0       0       0       0       0
>>>    0         |  611         i     = rec.motorcycles
>>> 0       0       0       0       0       0       0       0       0
>>> 616     1       0       0       0       0       0       0       0       0
>>>    0         |  617         j     = rec.sport.baseball
>>> 0       0       0       0       0       0       1       0       0
>>> 0       620     0       0       0       0       0       0       0       0
>>>    0         |  621         k     = rec.sport.hockey
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       580     0       0       0       0       0       1       0
>>>    0         |  581         l     = sci.crypt
>>> 0       0       0       3       1       0       0       0       0
>>> 0       0       0       571     0       0       0       0       0       0
>>>    0         |  575         m     = sci.electronics
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       0       2       583     0       0       0       0       0
>>>    0         |  585         n     = sci.med
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       0       0       1       599     0       0       0       0
>>>    0         |  600         o     = sci.space
>>> 0       1       0       0       0       0       0       0       0
>>> 0       0       0       0       0       0       615     0       0       0
>>>    0         |  616         p     = soc.religion.christian
>>> 1       0       0       0       0       0       0       0       0
>>> 0       0       0       0       0       0       1       560     0       0
>>>    0         |  562         q     = talk.politics.mideast
>>> 0       0       1       0       0       0       0       0       0
>>> 0       0       1       0       0       0       0       0       548     0
>>>    1         |  551         r     = talk.politics.guns
>>> 10      0       0       0       0       0       0       0       0
>>> 0       0       0       0       0       1       1       0       2       344
>>>    1         |  359         s     = talk.religion.misc
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       1       1       0       0       0       0       2       0
>>>    462       |  466         t     = talk.politics.misc
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa                                       0.9847
>>>  Accuracy                                   99.0869%
>>> Reliability                                94.3334%
>>> Reliability (standard deviation)            0.2169
>>>
>>> Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 14304 ms (Minutes: 0.2384)
>>> + echo 'Testing on holdout set'
>>> Testing on holdout set
>>>
>>> [snip]
>>>
>>> INFO: Standard NB Results:
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances          :       6718       90.1019%
>>> Incorrectly Classified Instances        :        738        9.8981%
>>> Total Classified Instances              :       7456
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a       b       c       d       e       f       g       h       i
>>> j       k       l       m       n       o       p       q       r       s
>>>    t        <--Classified as
>>> 294     0       0       0       0       0       0       0       0
>>> 0       0       2       0       1       1       6       1       1       16
>>>     0         |  322         a     = alt.atheism
>>> 0       345     6       14      6       11      6       0       0
>>> 0       0       5       7       1       3       0       0       0       0
>>>    0         |  404         b     = comp.graphics
>>> 2       29      177     78      22      19      9       1       0
>>> 0       0       4       2       0       1       1       0       0       1
>>>    1         |  347         c     = comp.os.ms-windows.misc
>>> 1       9       2       335     18      2       10      0       0
>>> 0       1       0       8       0       0       0       0       0       0
>>>    0         |  386         d     = comp.sys.ibm.pc.hardware
>>> 1       4       2       13      347     3       5       1       0
>>> 0       1       0       7       1       0       0       0       1       0
>>>    0         |  386         e     = comp.sys.mac.hardware
>>> 0       20      0       4       0       352     4       0       0
>>> 0       0       0       1       1       3       0       1       0       1
>>>    0         |  387         f     = comp.windows.x
>>> 0       2       0       21      5       1       323     7       2
>>> 2       0       2       12      0       3       0       0       0       0
>>>    1         |  381         g     = misc.forsale
>>> 0       1       0       0       1       0       15      363     8
>>> 1       0       0       4       1       0       0       0       1       0
>>>    1         |  396         h     = rec.autos
>>> 0       1       0       0       0       0       6       6       370
>>> 0       0       0       0       1       0       0       0       0       1
>>>    0         |  385         i     = rec.motorcycles
>>> 1       0       0       1       1       0       2       1       2
>>> 362     5       0       2       0       0       0       0       0       0
>>>    0         |  377         j     = rec.sport.baseball
>>> 0       0       0       1       2       0       0       0       0
>>> 3       371     0       0       0       0       0       0       0       0
>>>    1         |  378         k     = rec.sport.hockey
>>> 0       3       1       0       1       0       2       0       0
>>> 0       0       396     0       1       0       0       1       1       1
>>>    3         |  410         l     = sci.crypt
>>> 0       7       0       7       7       2       6       4       0
>>> 0       0       1       369     2       2       0       0       0       0
>>>    2         |  409         m     = sci.electronics
>>> 0       3       0       2       1       0       2       0       0
>>> 0       0       1       4       383     4       0       0       1       0
>>>    4         |  405         n     = sci.med
>>> 0       5       0       0       1       0       3       0       0
>>> 0       0       0       1       0       374     1       0       0       1
>>>    1         |  387         o     = sci.space
>>> 6       2       0       1       1       0       0       1       0
>>> 1       0       0       1       5       0       352     2       1       7
>>>    1         |  381         p     = soc.religion.christian
>>> 1       1       0       0       0       0       0       0       0
>>> 0       1       0       0       0       0       0       373     1       0
>>>    1         |  378         q     = talk.politics.mideast
>>> 0       0       0       0       0       0       1       0       1
>>> 0       0       2       0       0       0       0       0       346     2
>>>    7         |  359         r     = talk.politics.guns
>>> 26      1       0       1       0       0       0       2       0
>>> 1       1       0       0       1       1       20      2       6       200
>>>    7         |  269         s     = talk.religion.misc
>>> 1       0       0       0       0       0       0       2       0
>>> 0       1       0       0       2       2       0       1       14      0
>>>    286       |  309         t     = talk.politics.misc
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa                                       0.8726
>>> Accuracy                                   90.1019%
>>> Reliability                                85.4491%
>>> Reliability (standard deviation)            0.2222
>>>
>>> Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 10878 ms (Minutes: 0.1813)
>>>
>>> *SGD:*
>>> 7532 test files
>>>
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances          :       5649            75%
>>> Incorrectly Classified Instances        :       1883            25%
>>> Total Classified Instances              :       7532
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a       b       c       d       e       f       g       h       i
>>> j       k       l       m       n       o       p       q       r       s
>>>    t        <--Classified as
>>> 186     6       3       10      5       0       33      4       13
>>>  15      7       1       24      15      3       15      5       5       29
>>>     15        |  394         a     = sci.space
>>> 5       309     0       3       2       5       0       0       0
>>> 1       9       21      2       0       0       18      4       4       1
>>>    1         |  385         b     = comp.sys.mac.hardware
>>> 4       1       101     3       0       1       63      0       7
>>> 0       1       1       5       16      3       0       3       7       1
>>>    34        |  251         c     = talk.religion.misc
>>> 11      12      1       265     1       10      3       0       0
>>> 17      10      11      5       2       0       11      3       6       21
>>>     0         |  389         d     = comp.graphics
>>> 2       1       1       0       349     2       3       0       3
>>> 2       6       1       5       1       0       2       15      2       1
>>>    2         |  398         e     = rec.motorcycles
>>> 7       20      3       19      2       254     6       0       2
>>> 11      2       39      7       2       0       4       2       2       9
>>>    3         |  394         f     = comp.os.ms-windows.misc
>>> 2       1       13      0       0       0       247     0       1
>>> 1       3       0       6       2       4       0       2       3       5
>>>    29        |  319         g     = alt.atheism
>>> 1       1       0       0       2       0       2       361     0
>>> 1       2       0       2       0       0       1       3       22      0
>>>    1         |  399         h     = rec.sport.hockey
>>> 3       0       3       1       0       0       5       0       161
>>> 0       1       2       12      102     0       0       1       2       11
>>>     6         |  310         i     = talk.politics.misc
>>> 2       8       0       19      0       19      0       0       1
>>> 294     10      11      4       2       0       5       0       3       11
>>>     6         |  395         j     = comp.windows.x
>>> 2       10      0       1       1       0       0       0       0
>>> 1       347     13      2       1       0       5       3       2       2
>>>    0         |  390         k     = misc.forsale
>>> 1       36      0       6       1       25      0       0       1
>>> 6       10      257     2       1       0       34      6       0       6
>>>    0         |  392         l     = comp.sys.ibm.pc.hardware
>>> 2       2       2       2       1       0       12      0       0
>>> 6       10      4       312     5       2       13      11      3       3
>>>    6         |  396         m     = sci.med
>>> 2       0       3       2       1       0       0       1       13
>>>  0       5       1       2       314     2       0       2       2       10
>>>     4         |  364         n     = talk.politics.guns
>>> 1       0       2       1       1       0       34      1       33
>>>  1       3       0       1       8       271     1       4       5       6
>>>      3         |  376         o     = talk.politics.mideast
>>> 3       14      0       8       2       8       3       1       1
>>> 7       12      29      6       2       1       245     13      2       32
>>>     4         |  393         p     = sci.electronics
>>> 3       3       0       2       11      0       1       0       2
>>> 1       11      6       4       2       0       11      330     4       4
>>>    1         |  396         q     = rec.autos
>>> 0       0       1       0       1       0       4       12      3
>>> 1       3       0       0       0       0       5       6       359     1
>>>    1         |  397         r     = rec.sport.baseball
>>> 0       1       0       0       0       1       0       0       3
>>> 3       0       0       3       2       1       6       1       6       366
>>>    3         |  396         s     = sci.crypt
>>> 0       2       11      1       1       0       40      0       1
>>> 2       3       4       2       1       0       5       0       2       2
>>>    321       |  398         t     = soc.religion.christian
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa                                       0.7073
>>> Accuracy                                        75%
>>> Reliability                                70.6238%
>>> Reliability (standard deviation)            0.2187
>>> Log-likelihood                mean      :    -1.1182
>>>                               25%-ile   :    -1.6911
>>>                               75%-ile   :    -0.0803
>>>
>>> Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 10783 ms (Minutes: 0.17971666666666666)
>>>
>>>
>>>
>>>
>>> On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <su...@yahoo.com>wrote:
>>>
>>>> Thanks Andrew for reporting that. I rolled back the release to fix this
>>>> and few other issues.
>>>>
>>>> We have removed asf-examples*.sh from trunk as the sample file at the
>>>> url mentioned in ur email is not available.
>>>> This is something we need to fix and restore in 1.0.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <
>>>> ap.dev@outlook.com> wrote:
>>>>
>>>> from the asf-email-examples.sh script:
>>>>
>>>> # You will need to download or otherwise obtain some or all of the
>>>> Amazon ASF Em
>>>> ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566)
>>>> to use this
>>>> script.
>>>> # To obtain a full copy you will need to launch an EC2 instance and
>>>> mount the da
>>>> taset to download it, otherwise you can get a sample of it at
>>>> #
>>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>>>>
>>>> It looks like the:
>>>>
>>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>>>>
>>>> link is down.
>>>>
>>>> Is there somewhere else that we can get a subset of the ASF emails?
>>>>
>>>>
>>>>
>>>> Date: Tue, 21 Jan 2014 09:48:06 -0800
>>>> > Subject: Re: MAHOUT 0.9 Release - New URL
>>>> > From: andrew.musselman@gmail.com
>>>> > To: dev@mahout.apache.org
>>>> >
>>>> > Sure thing; continuing to smoke test the other examples tonight
>>>> >
>>>> >
>>>> > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <
>>>> suneel_marthi@yahoo.com>wrote:
>>>> >
>>>> > > Thanks Andrew M., see that some of the example scripts need to be
>>>> fixed as
>>>> > > they still refer to the deprecated algorithms.
>>>> > > See that the Streaming KMeans has failed for you as well.
>>>> > >
>>>> > > I'll be rolling back the release today to fix these issues.
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
>>>> > > andrew.musselman@gmail.com> wrote:
>>>> > >
>>>> > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default
>>>> 64-bit
>>>> > > Linux AMI from tarball.
>>>> > >
>>>> > > All tests pass.
>>>> > >
>>>> > > *Output of examples:*
>>>> > > *asf-email-examples.sh, run on mahout.apache.org
>>>> > > <http://mahout.apache.org>:*
>>>> > > *recommendations:*
>>>> > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
>>>> > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000  | less
>>>> > > 1
>>>> > >
>>>> > >
>>>> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
>>>> > > 4
>>>> > >
>>>> > >
>>>> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
>>>> > > 6
>>>> > >
>>>> > >
>>>> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
>>>> > > 8
>>>> > >     [12758:1.0,19409:1.0,11112:1.0]
>>>> > > 11
>>>> > >
>>>> > >
>>>> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
>>>> > > 14
>>>> > >
>>>> > >
>>>> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
>>>> > > 15
>>>> > >
>>>> > >
>>>> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
>>>> > > 16
>>>> > >
>>>> > >
>>>> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
>>>> > > 18
>>>> > >
>>>> > >
>>>> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
>>>> > > 19      [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
>>>> > > 20
>>>> > >
>>>> > >
>>>> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
>>>> > > [snip]
>>>> > >
>>>> > > *clustering; kmeans:*
>>>> > > [snip]
>>>> > >         Weight : [props - optional]:  Point:
>>>> > >         1.0 :
>>>> > >  [distance-squared=1.0193102046188427]:
>>>> > > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus=
>>>> > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
>>>> 7573:0.204,
>>>> > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093,
>>>> 9779:0.159,
>>>> > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
>>>> > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
>>>> > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
>>>> > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
>>>> > > 39789:0.110, 40743:0.190, 45775:0.086]
>>>> > >         1.0 : [distance-squared=0.9823018320457279]:
>>>> > > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus=
>>>> > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
>>>> 5336:0.106,
>>>> > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173,
>>>> 7832:0.072,
>>>> > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
>>>> > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
>>>> > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
>>>> > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
>>>> > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
>>>> > >         1.0 : [distance-squared=0.9509142993214911]:
>>>> > >
>>>> /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
>>>> > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
>>>> > >  4419:0.076,
>>>> > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056,
>>>> 7235:0.048,
>>>> > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123,
>>>> 7683:0.077,
>>>> > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
>>>> 10225:0.081,
>>>> > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
>>>> > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
>>>> > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
>>>> > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
>>>> > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
>>>> > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
>>>> > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
>>>> > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
>>>> > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
>>>> > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
>>>> > > 41280:0.065, 41696:0.072, 41947:0.118,
>>>> > >  43685:0.086, 44077:0.308,
>>>> > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
>>>> > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
>>>> > > [snip]
>>>> > >
>>>> > > *clustering; dirichlet:*
>>>> > > Get this complaint:
>>>> > > Running Dirichlet with K = 8
>>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
>>>> > > HADOOP_CONF_DIR=
>>>> > > MAHOUT-JOB:
>>>> > >
>>>> > >
>>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class:
>>>> dirichlet
>>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props
>>>> found on
>>>> > > classpath, will use command-line arguments only
>>>> > > Unknown program 'dirichlet' chosen.
>>>> > >
>>>> > > *clustering: minhash:*
>>>> > > Running Minhash
>>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
>>>> > > HADOOP_CONF_DIR=
>>>> > > MAHOUT-JOB:
>>>> > >
>>>> > >
>>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>>>> > > 14/01/21 05:17:27 WARN
>>>> > >  driver.MahoutDriver: Unable to add class: minhash
>>>> > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found
>>>> on
>>>> > > classpath, will use command-line arguments only
>>>> > > Unknown program 'minhash' chosen.
>>>> > >
>>>> > > *classification; standard:*
>>>> > > =======================================================
>>>> > > Summary
>>>> > > -------------------------------------------------------
>>>> > > Correctly Classified Instances          :       5384       87.7874%
>>>> > > Incorrectly Classified Instances        :        749       12.2126%
>>>> > > Total Classified Instances              :       6133
>>>> > >
>>>> > > =======================================================
>>>> > > Confusion Matrix
>>>> > > -------------------------------------------------------
>>>> > > a       b       c       d
>>>> > >     <--Classified as
>>>> > > 2949    7       531     25       |  3512        a     = dev
>>>> > > 0       0       0       0        |  0           b     = general
>>>> > > 99      8       1763    8        |  1878        c     = user
>>>> > > 41      1       29      672      |  743         d     = commits
>>>> > >
>>>> > > =======================================================
>>>> > > Statistics
>>>> > > -------------------------------------------------------
>>>> > > Kappa
>>>> > >  0.7877
>>>> > > Accuracy                                   87.7874%
>>>> > > Reliability                                 53.658%
>>>> > > Reliability (standard deviation)            0.4911
>>>> > >
>>>> > > *classification; complementary:*
>>>> > > =======================================================
>>>> > > Summary
>>>> > > -------------------------------------------------------
>>>> > > Correctly Classified Instances          :       5530       90.1679%
>>>> > > Incorrectly Classified Instances        :        603        9.8321%
>>>> > > Total Classified Instances              :
>>>> > >  6133
>>>> > >
>>>> > > =======================================================
>>>> > > Confusion Matrix
>>>> > > -------------------------------------------------------
>>>> > > a       b       c       d       <--Classified as
>>>> > > 3168    0       276     68       |  3512        a     = dev
>>>> > > 0       0       0       0        |  0           b     = general
>>>> > > 196     0       1652    30       |  1878        c     = user
>>>> > > 25      0       8       710      |  743         d     =
>>>> > >  commits
>>>> > >
>>>> > > =======================================================
>>>> > > Statistics
>>>> > > -------------------------------------------------------
>>>> > > Kappa                                       0.8259
>>>> > > Accuracy                                   90.1679%
>>>> > > Reliability                                54.7459%
>>>> > > Reliability (standard deviation)            0.5005
>>>> > >
>>>> > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms
>>>> (Minutes:
>>>> > > 0.34836666666666666)
>>>> > >
>>>> > > *classification; sgd, with three categories:*
>>>> > > Running SGD Training
>>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
>>>> > >  and
>>>> > > HADOOP_CONF_DIR=
>>>> > > MAHOUT-JOB:
>>>> > >
>>>> > >
>>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>>>> > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
>>>> > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
>>>> classpath,
>>>> > > will use command-line arguments only
>>>> > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
>>>> > > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
>>>> > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
>>>> > > --output=[asf-output/classification/sgd/models], --poolSize=[5],
>>>> > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
>>>> > > 24168 training files
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       1
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>>>> > >  2
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       3
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       4
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       6
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       8
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       10
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00
>>>> > >    0.00    0.00    0.0000000       0.0000000       12
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       15
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       20
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       25
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       30
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000
>>>> > >     0.0000000       40
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       50
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       60
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       70
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       80
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       100
>>>> > > 0.000
>>>> > >  0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       120
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       140
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       150
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       200
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       250
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00
>>>> > >  0.00    0.00    0.0000000       0.0000000       300
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       400
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       500
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       600
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       700
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000
>>>> > >  0.0000000       800
>>>> > > 0.000   0.00    none
>>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
>>>> > > 1.0019413e-08   1000    -0.607  75.78   none
>>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
>>>> > > 1.0019413e-08   1200    -0.607  75.78   none
>>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
>>>> > > 1.0019413e-08   1400    -0.607  75.78   none
>>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
>>>> > > 1.0019413e-08   1500    -0.607  75.78   none
>>>> > > 0.24    43686.00        17924.00        329.50
>>>> > >  1.0571799e-08
>>>> > > 1.0032261e-08   2000    -0.487  82.65   none
>>>> > > 0.24    49753.00        21610.00        330.71  1.3770070e-08
>>>> > > 1.0011902e-08   2500    -0.439  83.90   none
>>>> > > 0.24    49753.00        21610.00        330.71  1.3770070e-08
>>>> > > 1.0011902e-08   3000    -0.439  83.90   none
>>>> > > 0.32    50635.00        28531.00        437.09  1.0551175e-08
>>>> > > 1.0000001e-08   4000    -0.351  88.14   none
>>>> > > 0.32    50635.00        32642.00        437.09  1.0551175e-08
>>>> > > 1.0000000e-08   5000    -0.378  87.10   none
>>>> > > 0.32    50635.00        36461.00        437.09
>>>> > >  1.0556652e-08
>>>> > > 1.0000001e-08   6000    -0.372  86.89   none
>>>> > > 0.32    50635.00        37768.00        437.09  1.0576742e-08
>>>> > > 1.0000001e-08   7000    -0.334  89.26   none
>>>> > > 0.32    50635.00        38807.00        437.09  1.0576742e-08
>>>> > > 1.0000000e-08   8000    -0.368  87.52   none
>>>> > > 0.32    50635.00        44731.00        437.09  1.0576716e-08
>>>> > > 1.0000000e-08   10000   -0.374  87.39   none
>>>> > > 0.32    50635.00        45672.00        437.09  1.0576716e-08
>>>> > > 1.0000000e-08   12000   -0.298  88.26   none
>>>> > > Exception in thread "main" java.lang.IllegalStateException:
>>>> > > java.lang.ArrayIndexOutOfBoundsException:
>>>> > >  2
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
>>>> > >         at
>>>> > >
>>>> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
>>>> > >         at
>>>> > >
>>>> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
>>>> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>> Method)
>>>> > >         at
>>>> > >
>>>> > >
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>> > >
>>>> > >  at
>>>> > >
>>>> > >
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> > >         at java.lang.reflect.Method.invoke(Method.java:622)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>>> > >         at
>>>> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>> > >         at
>>>> > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>>>> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>> Method)
>>>> > >         at
>>>> > >
>>>> > >
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>> > >         at
>>>> > >
>>>> > >
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> > >         at java.lang.reflect.Method.invoke(Method.java:622)
>>>> > >         at
>>>> > >  org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>>>> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
>>>> > >         at
>>>> > > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
>>>> > >
>>>> > >  at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
>>>> > >         at
>>>> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>> > >         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>> > >         at
>>>> > >
>>>> > >
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>>>> > >         at
>>>> > >
>>>> > >
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> > >         at java.lang.Thread.run(Thread.java:701)
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
>>>> > > andrew.musselman@gmail.com> wrote:
>>>> > >
>>>> > > > Trying out the build today
>>>> > > >
>>>> > > >
>>>> > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
>>>> suneel_marthi@yahoo.com
>>>> > > >wrote:
>>>> > > >
>>>> > > >> This is an issue (trivial one though) that needs to be fixed for
>>>> 0.9
>>>> > > >> Release, will be rerolling the release today (in the next few
>>>> hrs) and
>>>> > > >> putting out a new release candidate in staging.
>>>> > > >>
>>>> > > >> Thanks for reporting this Andrew P.
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
>>>> > > ap.dev@outlook.com>
>>>> > > >> wrote:
>>>> > > >>
>>>> > > >> I ran through the tests with on a CentOS VM
>>>> > >  AMD64 2 cores 4 GB RAM.  Had
>>>> > > >> a bit of trouble getting the Hadoop natives to compile and
>>>> therefore may
>>>> > > >> have run into some problems because of the hadoop setup.  Ran
>>>> into some
>>>> > > >> problems in the example scripts.  Particularly with
>>>> > > >> ./cluster-syntheticcontrol.sh ->4,5.  I will run through the
>>>> rest of the
>>>> > > >> examples when im sure I've got hadoop setup right.
>>>> > > >>
>>>> > > >>
>>>> > > >> Apache Maven 3.1.2-SNAPSHOT
>>>> > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
>>>> > > >> Java home: /usr/java/jdk1.6.0_45/jre
>>>> > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch:
>>>> "amd64",
>>>> > > >> family: "unix"
>>>> > > >> $MAHOUT_LOCAL=true
>>>> > > >> Hadoop 2.2.0
>>>> > > >>
>>>> > > >>
>>>> > > >> a) Verify that u can unpack the release (tar or zip) ...passed
>>>> (tar)
>>>> > > >> [passed ]
>>>> > > >>
>>>> > > >> b) Verify u r able to compile the
>>>> > >  distro
>>>> > > >>
>>>> > > >>     mvn compile- [passed with warnings]
>>>> > > >>
>>>> > > >>     [WARNING]  Expected all dependencies to require Scala
>>>> version: 2.9.3
>>>> > > >>     [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires
>>>> scala
>>>> > > >> version: 2.9.3
>>>> > > >>     [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala
>>>> > > >> version: 2.9.2
>>>> > > >>     [WARNING] Multiple versions of scala libraries detected!
>>>> > > >>
>>>> > > >> c)  Run through the unit tests: mvn clean test
>>>> > > >>     mvn clean test [passed]
>>>> > > >>
>>>> > > >> d) Run the
>>>> > > >>  example scripts under $MAHOUT_HOME/examples/bin.
>>>> > > >> Please run through all the different options in each script
>>>> > > >>
>>>> > > >>     Running example scripts with $MAHOUT_LOCAL=true
>>>> > > >>
>>>> > > >>
>>>> > >  ./cluster-syntheticcontrol.sh ->1 [works]
>>>> > > >>     ./cluster-syntheticcontrol.sh ->2 [works]
>>>> > > >>     ./cluster-syntheticcontrol.sh ->3 [works]
>>>> > > >>
>>>> > > >>
>>>> > > >>     ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
>>>> > > >>     [...]
>>>> > > >>     WARNING: Unable to add class:
>>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>>>> > > >>     java.lang.ClassNotFoundException:
>>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>>>> > > >>         at
>>>> > > >>  java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>> > > >>         at java.security.AccessController.doPrivileged(Native
>>>> Method)
>>>> > > >>         at
>>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>> > > >>         at
>>>> > >  java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>>> > > >>         at
>>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>>> > > >>         at java.lang.Class.forName0(Native Method)
>>>> > > >>         at java.lang.Class.forName(Class.java:171)
>>>> > > >>         at
>>>> > > >>
>>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>>>> > > >>         at
>>>> > > >>
>>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>>>> > > >>     Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
>>>> > > >>
>>>> > > >>
>>>> > > >>     ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
>>>> > > >>
>>>> > > >>     WARNING: Unable to add class:
>>>> > > >>
>>>> > >  org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>>>> > > >>     java.lang.ClassNotFoundException:
>>>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>>>> > > >>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>> > > >>         at java.security.AccessController.doPrivileged(Native
>>>> Method)
>>>> > > >>         at
>>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>>> > > >>         at
>>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>>> > > >>         at java.lang.Class.forName0(Native Method)
>>>> > > >>         at
>>>> > >  java.lang.Class.forName(Class.java:171)
>>>> > > >>         at
>>>> > > >>
>>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>>>> > > >>         at
>>>> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>>>> > > >>     Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
>>>> > > >>     WARNING: No
>>>> > > >>
>>>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found
>>>> > > on
>>>> > > >> classpath, will use command-line arguments only
>>>> > > >>     Unknown program
>>>> > > >>  'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
>>>> chosen.
>>>> > > >>
>>>> > > >>
>>>> > > >>     ./classify-20newsgroups.sh ->1 [works]
>>>> > > >>     ./classify-20newsgroups.sh ->2 [works]
>>>> > > >>
>>>> > > >>
>>>> > > >>     cluster-reuters.sh ->1 [works]
>>>> > > >>
>>>> > >  cluster-reuters.sh ->2 [works]
>>>> > > >>     cluster-reuters.sh ->3 [works]
>>>> > > >>
>>>> > > >>     Same error as noted previosly in the thread:
>>>> > > >>
>>>> > > >>     cluster-reuters.sh ->4 [0 clusters]
>>>> > > >>
>>>> > > >>     [...]
>>>> > > >>
>>>> > > >>     WARNING: No qualcluster.props found on classpath, will use
>>>> > > >> command-line arguments only
>>>> > > >>     Num clusters: 0; maxDistance: 0.000000
>>>> > > >>     [Dunn Index]
>>>> > > >>  First: Infinity
>>>> > > >>     [Davies-Bouldin Index] First: NaN
>>>> > > >>     Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
>>>> > > >>     INFO: Program took 669 ms (Minutes: 0.01115)
>>>> > > >>     cluster,distance.mean,distance.sd
>>>> > > >>
>>>> > >
>>>> > >
>>>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
>>>> > > >> > From: suneel_marthi@yahoo.com
>>>> > > >> > Subject: MAHOUT 0.9 Release - New URL
>>>> > > >> > To: user@mahout.apache.org; dev@mahout.apache.org
>>>> > > >> >
>>>> > > >> > Third time's a Charm!!!
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > Here's the new URL for Mahout 0.9 Release:
>>>> > > >> >
>>>> > > >>
>>>> > >
>>>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>>>> > > >> >
>>>> > > >> > For those volunteering to test this, some of the things to be
>>>> > > verified:
>>>> > > >> >
>>>> > > >> > a) Verify that u can unpack the release (tar or zip)
>>>> > > >> > b) Verify u r able to compile the distro
>>>> > > >> > c)  Run through the unit tests: mvn clean test
>>>> > > >> > d) Run the example scripts
>>>> > > >>  under $MAHOUT_HOME/examples/bin. Please run through all the
>>>> different
>>>> > > >> options in each script.
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > Committers
>>>> > > >> >  and PMC members:
>>>> > > >> > ---------------------------------------
>>>> > > >> >
>>>> > > >> > Need 'at least 3 +1 votes' for the Release to pass.
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > Thanks and
>>>> > >  Regards.
>>>> > > >>
>>>> > > >
>>>> > > >
>>>> > >
>>>>
>>>
>>>
>>
>

Re: MAHOUT 0.9 Release - New URL

Posted by Suneel Marthi <su...@yahoo.com>.
Andrew M., Andrew P. and others,

Sebastian and me fixed a few issues today (for 0.9):

a) Removed asf-email-examples.sh script and few other scripts that should have been removed. Also removed references/invocations to algorithms that have been removed from the codebase.
b) Fixed the issue with Streaming Kmeans clustering and checked in the code.  
c) Resurrected Frequent Pattern Mining implementation for 0.9.

Please checkout the latest code from trunk, run a build locally and run thru the example scripts. 

Thanks and Regards.






On Wednesday, January 22, 2014 12:11 AM, Andrew Musselman <an...@gmail.com> wrote:
 
*factorize-movielens-1M.sh:*
RMSE is:

0.8519064098265133


Sample recommendations:

2229
[2197:4.4961276,527:4.4309845,2972:4.4180074,318:4.379484,572:4.312696,3092:4.249903]
5848
[1900:4.6775646,3787:4.6623707,632:4.641377,2609:4.608225,3808:4.6058936,2998:4.6057487]
3728
[572:4.951382,47:4.748921,874:4.6945343,1563:4.679901,3314:4.6621537,50:4.655838]
1252
[53:5.0,3816:4.9664702,3077:4.9494777,213:4.94007,3808:4.9060082,978:4.8568053]
634
[572:5.0,3092:4.779557,1872:4.72024,2687:4.629712,2125:4.615142,3853:4.5261393]
5516    [572:5.0,2197:5.0,3092:5.0,318:4.908213,356:4.885,3844:4.8237453]
2276    [1204:5.0,572:5.0,912:5.0,1250:5.0,1272:4.999891,1262:4.989652]
4219
[53:4.8112006,598:4.775032,858:4.761604,572:4.7579737,1219:4.680987,1221:4.6604886]
91      [1198:5.0,2762:5.0,1207:5.0,1234:5.0,318:5.0,260:5.0]
502
[953:5.0,260:4.9800477,1234:4.869403,1198:4.8527064,1207:4.8497486,3469:4.847286]

factorize-netflix.sh:
References a no-longer-available data set that Netflix took down after the
competition; should at least mention that the data set is no longer
"online" at least.


On Tue, Jan 21, 2014 at 8:05 PM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:

> *clustering-syntheticcontrol.sh*
>
> *Canopy:*
> [snip]
>         1.0 : [distance-squared=1740.681000315628]: [35.486, 25.600,
> 29.914, 30.200, 27.654, 28.647, 26.582, 32.941, 34.728, 29.047, 34.706,
> 34.764, 28.816, 30.271, 25.784, 25.035, 35.436, 29.075, 24.267, 24.625,
> 28.162, 26.218, 28.773, 28.966, 28.802, 34.292, 26.211, 33.363, 32.920,
> 31.891, 34.504, 32.686, 24.327, 35.981, 31.390, 10.832, 20.238, 10.051,
> 14.877, 10.570, 19.603, 14.544, 10.667, 16.470, 19.007, 10.352, 13.473,
> 12.196, 10.684, 16.620, 20.434, 17.069, 18.744, 9.599, 11.195, 12.002,
> 10.017, 17.149, 14.850, 10.890]
>         1.0 : [distance-squared=1455.363773097357]: [31.022, 28.140,
> 26.730, 26.570, 29.561, 26.966, 28.049, 25.673, 33.721, 26.275, 30.410,
> 31.101, 24.019, 35.659, 25.253, 25.932, 28.618, 32.423, 33.666, 33.745,
> 35.118, 29.164, 25.477, 31.947, 35.491, 30.730, 25.820, 24.651, 25.528,
> 31.343, 29.005, 31.825, 26.891, 28.194, 31.429, 16.935, 8.070, 16.604,
> 14.743, 10.342, 8.155, 10.395, 17.689, 16.791, 14.138, 15.761, 6.787,
> 13.062, 16.660, 15.021, 9.891, 9.216, 11.550, 8.877, 18.220, 9.477, 10.342,
> 16.430, 11.898, 15.366]
>         1.0 : [distance-squared=1679.9304895378882]: [29.625, 25.503,
> 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> 15.285, 22.528, 20.657, 24.129]
>         1.0 : [distance-squared=2044.2887801683828]: [27.414, 25.397,
> 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> 20.229, 11.131, 9.980, 10.720]
>         1.0 : [distance-squared=1385.3154063160764]: [35.899, 26.672,
> 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> 16.546, 15.927, 18.084, 17.475]
>         1.0 : [distance-squared=1920.6376615603585]: [24.538, 24.280,
> 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> 19.310, 12.999, 17.460]
>         1.0 : [distance-squared=2192.939571172661]: [34.335, 30.938,
> 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> 11.743, 11.699, 10.152]
> Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Wrote 6 clusters
> Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 5827 ms (Minutes: 0.09711666666666667)
>
> *K-means:*
> [snip]
>         1.0 : [distance-squared=2873.881301031739]: [26.369, 37.791,
> 41.839, 39.694, 36.728, 35.079, 30.668, 24.755, 20.610, 18.885, 15.459,
> 14.074, 19.117, 34.230, 32.202, 40.715, 39.543, 37.193, 39.448, 30.829,
> 25.273, 22.324, 19.801, 13.505, 17.462, 24.287, 27.398, 38.577, 42.108,
> 41.947, 43.987, 41.331, 28.958, 23.664, 20.308, 22.244, 12.149, 15.768,
> 18.964, 32.579, 33.202, 36.205, 42.364, 40.601, 35.415, 39.576, 33.145,
> 19.936, 23.062, 19.053, 24.383, 19.611, 25.218, 38.860, 36.570, 38.964,
> 36.946, 36.900, 32.593, 31.563]
>         1.0 : [distance-squared=2525.8924241648783]: [35.389, 31.178,
> 40.041, 43.034, 49.524, 40.942, 42.369, 30.153, 26.253, 23.178, 19.885,
> 19.693, 21.837, 26.858, 23.533, 29.798, 43.401, 43.349, 48.238, 43.868,
> 44.017, 28.056, 25.284, 22.737, 13.703, 14.023, 22.115, 23.720, 25.478,
> 37.870, 37.868, 46.954, 43.135, 41.286, 37.269, 29.347, 24.312, 21.743,
> 19.152, 21.668, 10.882, 16.147, 30.020, 28.472, 38.716, 44.620, 47.847,
> 48.846, 39.361, 38.449, 33.105, 19.935, 14.961, 12.149, 12.630, 13.459,
> 18.967, 35.473, 30.146, 45.527]
>         1.0 : [distance-squared=2392.7171990886272]: [27.662, 37.199,
> 39.158, 44.264, 46.473, 40.178, 38.728, 24.412, 25.155, 15.938, 13.125,
> 16.867, 13.875, 29.969, 34.322, 40.870, 44.225, 47.594, 47.607, 44.751,
> 39.813, 32.461, 16.646, 16.305, 21.256, 20.627, 19.267, 31.901, 34.995,
> 43.643, 43.152, 47.125, 48.524, 33.131, 32.697, 20.300, 14.350, 18.330,
> 12.155, 11.261, 19.701, 29.313, 30.457, 39.914, 49.543, 49.851, 46.445,
> 39.561, 28.860, 22.486, 13.967, 10.006, 11.715, 15.945, 21.348, 25.439,
> 41.024, 37.105, 45.623, 43.589]
>         1.0 : [distance-squared=1419.8378244373016]: [25.784, 34.129,
> 42.659, 37.176, 35.961, 34.307, 32.108, 29.749, 25.047, 17.455, 24.652,
> 25.311, 22.995, 30.256, 25.955, 28.426, 34.556, 40.386, 39.642, 40.566,
> 32.612, 34.091, 26.033, 25.668, 26.545, 17.338, 24.980, 20.134, 27.542,
> 36.612, 31.855, 37.947, 39.736, 33.535, 36.607, 37.479, 32.612, 22.262,
> 20.662, 16.124, 24.547, 27.686, 21.747, 27.198, 31.259, 40.569, 37.067,
> 34.465, 34.730, 33.371, 23.060, 30.162, 22.022, 22.216, 14.812, 19.357,
> 24.508, 34.432, 32.155, 34.839]
>         1.0 : [distance-squared=4186.814512311335]: [25.870, 39.195,
> 36.908, 47.052, 47.384, 40.741, 42.494, 30.282, 25.834, 17.650, 16.004,
> 17.895, 13.321, 19.045, 27.440, 31.911, 39.208, 43.622, 41.567, 44.815,
> 44.921, 35.422, 35.477, 23.190, 17.859, 14.684, 23.504, 23.141, 21.746,
> 30.816, 31.361, 37.015, 38.094, 46.688, 47.681, 43.777, 39.652, 31.701,
> 23.767, 22.265, 22.654, 22.327, 19.195, 21.163, 29.602, 27.563, 36.244,
> 38.859, 44.234, 42.352, 42.160, 40.172, 30.094, 21.092, 25.193, 13.096,
> 18.111, 14.754, 27.386, 27.026]
>         1.0 : [distance-squared=1544.4011543572997]: [28.075, 41.784,
> 42.120, 38.735, 44.320, 34.316, 32.212, 31.868, 24.301, 14.547, 17.178,
> 22.279, 24.357, 31.011, 31.444, 34.837, 46.550, 48.301, 38.859, 42.363,
> 35.657, 31.499, 19.794, 12.124, 15.371, 12.436, 15.763, 24.679, 32.597,
> 43.004, 36.616, 38.935, 42.954, 34.957, 36.183, 28.177, 16.326, 14.988,
> 10.680, 22.728, 24.075, 24.058, 36.616, 43.982, 39.198, 40.118, 40.078,
> 34.752, 34.018, 23.750, 18.374, 12.251, 15.539, 18.699, 28.973, 24.044,
> 39.404, 38.034, 46.458, 44.432]
>         1.0 : [distance-squared=825.9338725427806]: [33.670, 38.675,
> 39.742, 41.989, 37.291, 43.975, 31.909, 25.878, 31.080, 15.858, 13.950,
> 23.097, 19.983, 21.692, 31.579, 38.570, 33.376, 38.843, 41.936, 33.534,
> 39.195, 32.897, 25.343, 18.523, 15.089, 17.771, 22.614, 25.313, 23.687,
> 29.010, 41.995, 35.712, 40.872, 41.669, 32.156, 25.162, 24.980, 23.705,
> 18.413, 20.975, 14.906, 26.171, 30.165, 27.818, 35.083, 39.514, 37.851,
> 33.967, 32.338, 34.977, 26.589, 28.079, 19.597, 24.669, 23.098, 25.685,
> 28.215, 34.940, 36.910, 39.749]
> Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Wrote 6 clusters
> Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 16902 ms (Minutes: 0.2817)
>
> *Fuzzy k-means:*
> [snip]
>         1.0 : [distance-squared=971.7369782121968]: [29.625, 25.503,
> 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> 15.285, 22.528, 20.657, 24.129]
>         1.0 : [distance-squared=2054.618163154475]: [27.414, 25.397,
> 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> 20.229, 11.131, 9.980, 10.720]
>         1.0 : [distance-squared=954.6503560728597]: [35.899, 26.672,
> 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> 16.546, 15.927, 18.084, 17.475]
>         1.0 : [distance-squared=2817.9170498632957]: [24.538, 24.280,
> 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> 19.310, 12.999, 17.460]
>         1.0 : [distance-squared=3472.3684696871424]: [34.335, 30.938,
> 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> 11.743, 11.699, 10.152]
> Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Wrote 6 clusters
> Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 106615 ms (Minutes: 1.7769166666666667)
>
> *Dirichlet and Meanshift:*
> Already detailed in M-1400, deprecated jobs still referenced.
>
>
>
> On Tue, Jan 21, 2014 at 6:20 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
>> *cluster-reuters.sh*
>> *k-means:*
>>
>> [snip]
>> :VL-19482{n=913 c=[0.06:0.011, 0.1:0.007, 0.13:0.010, 0.25:0.016,
>> 0.38:0.020, 0.4:0.007, 0.5:0.032, 0
>>         Top Terms:
>>                 banks                                   =>
>> 3.841823268955143
>>                 bank                                    =>
>>  3.80633066361209
>>                 debt                                    =>
>>  3.28065219870794
>>                 said                                    =>
>>  2.5965700942088583
>>                 he                                      =>
>> 2.335682813857497
>>                 foreign                                 =>
>>  2.2217853688201403
>>                 billion                                 =>
>>  2.1970193848291335
>>                 would                                   =>
>>  1.9932392063955617
>>                 loans                                   =>
>>  1.9309276792854233
>>                 interest                                =>
>>  1.787324501938
>>                 have                                    =>
>> 1.762981951432578
>>                 its                                     =>
>>  1.7615109954971866
>>                 which                                   =>
>>  1.5822081148036862
>>                 has                                     =>
>>  1.5600708189041956
>>                 dlrs                                    =>
>>  1.5571038313005996
>>                 finance                                 =>
>>  1.5539758811252924
>>                 new                                     =>
>>  1.5176015811577555
>>                 had                                     =>
>>  1.5138723701401844
>>                 brazil                                  =>
>>  1.5083369853593172
>>                 payments                                =>
>>  1.4539044255886517
>>         Weight : [props - optional]:  Point:
>>
>> :VL-7320{n=2726 c=[0:0.003, 0.1:0.010, 0.2:0.007, 0.3:0.009, 0.4:0.007,
>> 0.40:0.003, 0.5:0.009, 0.57:0
>>         Top Terms:
>>                 vs                                      =>
>> 6.126130791333171
>>                 net                                     =>
>> 4.012191567277523
>>                 cts                                     =>
>> 3.822006848832744
>>                 shr                                     =>
>>  3.6786004856764527
>>                 mln                                     =>
>>  2.9011643584038698
>>                 loss                                    =>
>> 2.788368861463607
>>                 qtr                                     =>
>> 2.714140225051522
>>                 revs                                    =>
>>  2.4739861236454717
>>                 profit                                  =>
>>  1.8146888090247015
>>                 note                                    =>
>>  1.7977163272138388
>>                 dlrs                                    =>
>>  1.6164390808155846
>>                 avg                                     =>
>>  1.3901765773336587
>>                 shrs                                    =>
>>  1.3856326531419314
>>                 mths                                    =>
>>  1.3168717272038506
>>                 4th                                     =>
>>  1.2161158425617289
>>                 oper                                    =>
>> 1.182419473776814
>>                 year                                    =>
>> 1.178086061733047
>>                 nine                                    =>
>>  1.0670554836445316
>>                 3rd                                     =>
>> 1.041334410056592
>>                 inc                                     =>
>>  1.0019361981554935
>>         Weight : [props - optional]:  Point:
>>
>>
>> Inter-Cluster Density: 0.45562152681859414
>> Intra-Cluster Density: 0.6952712632167628
>> CDbw Inter-Cluster Density: 0.0
>> CDbw Intra-Cluster Density: 16.486930227598684
>> CDbw Separation: 194.49005884464628
>>
>> *fuzzy k-means:*
>> :SV-18539{n=1039 c=[0:0.026, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
>> 0.01:0.005, 0.02:0.002, 0.0
>>         Top Terms:
>>                 said                                    =>
>>  1.8665592354713065
>>                 its                                     =>
>>  1.1335212213411592
>>                 pct                                     =>
>>  1.0862816801353348
>>                 dlrs                                    =>
>>  1.0854998884993752
>>                 mln                                     =>
>> 1.043163996400643
>>                 from                                    =>
>>  0.9684961110525736
>>                 has                                     =>
>> 0.912161511978058
>>                 company                                 =>
>>  0.8754186972808333
>>                 mar                                     =>
>>  0.8675333452422878
>>                 inc                                     =>
>>  0.7678617590362815
>>                 would                                   =>
>>  0.7610968883652675
>>                 he                                      =>
>>  0.7459988770503974
>>                 which                                   =>
>>  0.7435613119406804
>>                 year                                    =>
>>  0.7302840632748394
>>                 u.s                                     =>
>>  0.7281061062439116
>>                 shares                                  =>
>>  0.7260764102983083
>>                 corp                                    =>
>>  0.7179807367808658
>>                 new                                     =>
>>  0.7044203783157115
>>                 stock                                   =>
>>  0.6962010978721442
>>                 have                                    =>
>>  0.6464265467298506
>> :SV-9431{n=1034 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
>> 0.01:0.004, 0.02:0.002, 0.02
>>         Top Terms:
>>                 said                                    =>
>> 1.864911184196927
>>                 dlrs                                    =>
>> 1.199286689822081
>>                 mln                                     =>
>>  1.1802134783562215
>>                 pct                                     =>
>>  1.1529704214798124
>>                 its                                     =>
>>  1.1184398851519701
>>                 from                                    =>
>> 1.016647848050332
>>                 company                                 =>
>> 0.894703604722841
>>                 mar                                     =>
>> 0.879986159541356
>>                 has                                     =>
>>  0.8642799128491316
>>                 year                                    =>
>>  0.8271823503717782
>>                 inc                                     =>
>>  0.7871293745341424
>>                 corp                                    =>
>> 0.737705498468879
>>                 which                                   =>
>> 0.722975201852743
>>                 would                                   =>
>> 0.708000816484415
>>                 u.s                                     =>
>>  0.7073294276173905
>>                 billion                                 =>
>>  0.7055723996916351
>>                 he                                      =>
>>  0.7042684217823294
>>                 new                                     =>
>>  0.6834737905434939
>>                 shares                                  =>
>>  0.6753327384172428
>>                 stock                                   =>
>>  0.6576225144041699
>> :SV-4785{n=1044 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
>> 0.01:0.006, 0.02:0.002, 0.02
>>         Top Terms:
>>                 said                                    =>
>>  1.8796076179735086
>>                 its                                     =>
>> 1.172025965452378
>>                 dlrs                                    =>
>> 1.130422792460914
>>                 pct                                     =>
>> 1.082038255241358
>>                 mln                                     =>
>>  1.0772146872767114
>>                 company                                 =>
>>  0.9662235879639138
>>                 from                                    =>
>>  0.9473172871605616
>>                 has                                     =>
>>  0.9224712965830099
>>                 mar                                     =>
>>  0.8769325856924421
>>                 inc                                     =>
>>  0.8360245257169788
>>                 shares                                  =>
>>  0.8334595641384324
>>                 stock                                   =>
>>  0.7704621839612175
>>                 corp                                    =>
>>  0.7682400250301806
>>                 which                                   =>
>>  0.7389988207856137
>>                 would                                   =>
>>  0.7339708917389389
>>                 year                                    =>
>>  0.7088414843731325
>>                 new                                     =>
>>  0.7038109468655172
>>                 he                                      =>
>>  0.6993994455501005
>>                 u.s                                     =>
>>  0.6772649147622415
>>                 share                                   =>
>>  0.6241804830055171
>>
>> *lda:*
>>
>> [snip]
>> 21539
>> {0.02:0.7071698119320297,0.01:0.09185661419250732,0.055:0.05559716236030052,0:0.04416624482186779,0.046:0.04110903741109505,0.10:0.036664417789754995,0.073:0.009543544335363714,0.006913:0.0050293341354450535,0.007050:0.004031353935388081,0.003:0.0019163134919350053}
>> 21540
>> {0.04:0.4289840457590362,0.006913:0.3764552842292379,0.03:0.14201700033942147,0.025:0.031492533562460345,0.01:0.0057428959027208,0.0625:0.005563615996288134,0.007050:0.004410121345864598,0.02:0.003315679992182833,0.06:0.0010680133665222197,0.057:2.6908116668663575E-4}
>> 21541
>> {0:0.6323790020346274,0.02:0.282038438551802,0.06:0.05929917592183956,0.046:0.021922159161082488,0.025:0.003123559949176154,0.1:6.163861804777446E-4,0.057:1.1513607281652563E-4,0.077:7.546641269535658E-5,0.05:6.020069105221075E-5,0.04:5.727546417602751E-5}
>> 21542
>> {0.06:0.7916202902993545,0.003:0.12474538653424426,0.03:0.0516187501990535,0.073:0.013078206873952192,0.077:0.006874558859394474,0.1:0.004726257215175674,0.07:0.0029954110394770084,0.02:0.0015535221634909381,0.0625:8.057802584933225E-4,0.006913:5.349608183182145E-4}
>> 21543
>> {0.03:0.8557703918728101,0.007050:0.0884696514495358,0.01:0.02386217742025169,0.055:0.01794750983209314,0.046:0.007244240662980594,0.02:0.0025021610305199344,0.1:0.0011795344808501283,0.0625:9.555718731181631E-4,0.077:8.384080940465597E-4,0.003:2.4480111447900804E-4}
>> 21544
>> {0.006913:0.6497351267772595,0.03:0.2586364130498913,0.003:0.0845308091084703,0.05:0.003488921831506818,0.0625:0.0011991059579690732,0.06:5.301659535652308E-4,0.055:3.152013670552699E-4,0.025:3.11949418681945E-4,0.02:2.2346509541410544E-4,0.057:1.727741085612565E-4}
>> 21545
>> {0.07:0.24937001680778675,0.03:0.22854680579732564,0.1:0.22068070811382484,0.05:0.21891358916881656,0.06:0.04971205462818302,0.046:0.029384231252419384,0.025:0.0014118858333346275,0.0625:0.0012321257282372393,0.073:3.1606402052550454E-4,0.04:1.1662985389192366E-4}
>> 21546
>> {0.006913:0.652435612545713,0.073:0.24282539253890825,0.003:0.055020663101050894,0.046:0.01978650831133704,0.04:0.011699600233998459,0.02:0.006822518565048025,0.05:0.0042733514779345234,0.025:0.0020768247329981497,0.03:0.0015466188524926666,0:0.0010433788491090747}
>> 21547
>> {0.1:0.935072008706917,0.077:0.040616382672055494,0.046:0.023138054104883225,0.06:3.42252302356326E-4,0.03:2.672247473681925E-4,0.007050:9.004419280457053E-5,0.006913:7.844271097106069E-5,0.04:7.198835322717653E-5,0.073:5.8017696474244885E-5,0.02:4.474316852571109E-5}
>> 21548
>> {0.006913:0.4510141547325999,0.03:0.21472666934984877,0.07:0.15060570743672352,0.046:0.08216181853028293,0.05:0.07498297963542139,0.077:0.01401215532342401,0.04:0.006135722806477439,0.073:0.0031124469556872442,0.02:0.001327252389650958,0.025:4.5167404646311124E-4}
>> 21549
>> {0.077:0.5249260290096315,0:0.27520186965742544,0.073:0.08959570186504386,0.057:0.05063809804337512,0.02:0.045313417578343,0.03:0.00856024297303885,0.06:0.0034039136814002697,0.07:8.611323331226122E-4,0.05:3.9387255234958607E-4,0.04:3.454752498776842E-4}
>> 21550
>> {0.077:0.8319708524327014,0.007050:0.16001430652201168,0.0625:0.0024884528530674226,0.03:0.0024228964094551416,0.006913:0.0010048446080994332,0.025:5.974607168723063E-4,0.06:2.7741593377792194E-4,0.057:2.2975316663437597E-4,0.073:2.1769059479546598E-4,0.01:1.4945234676270913E-4}
>> 21551
>> {0:0.43298549949426596,0.07:0.30407459462158964,0.0625:0.15744077685124136,0.01:0.06385977925647884,0.025:0.020205557109867888,0.04:0.013226123091680062,0.055:0.006501562964287696,0.073:5.871897881404074E-4,0.003:4.554836644848264E-4,0.05:2.0489216962221778E-4}
>> 21552
>> {0.06:0.7383744333942458,0.02:0.07321126682102753,0.07:0.05910378841288311,0.10:0.056907223730939045,0:0.02739559786902668,0.055:0.02228913751272657,0.1:0.00943274247398869,0.073:0.007301445750018608,0.03:0.0027711985062277246,0.046:0.0022569760697531112}
>> 21553
>> {0.006913:0.9383779606743132,0.07:0.018356074893823605,0.1:0.017572502072403694,0.025:0.011031848455352145,0.02:0.007110054905474798,0.10:0.0032390933761145377,0.01:0.0022676909091530165,0.06:6.35712654913703E-4,0.003:3.037608224814005E-4,0.0625:2.569989005512836E-4}
>> 21554
>> {0.02:0.4740260172915081,0.03:0.3264871353578654,0.003:0.12383071192450323,0:0.0387913487693229,0.046:0.013922363892934853,0.01:0.01115832101415319,0.055:0.004438529958216764,0.006913:0.0031112721201723413,0.025:0.002399946628810436,0.073:6.131643125523258E-4}
>> 21555
>> {0.01:0.42396888624053325,0.057:0.2876207419392007,0.055:0.13436747721404915,0.10:0.0923627978966975,0.05:0.030530372962547347,0.003:0.008404234144369286,0.006913:0.007605013189108045,0.007050:0.005108825532219537,0.0625:0.004651573366090424,0.025:0.002180181204608079}
>> 21556
>> {0.02:0.9634685201058687,0.077:0.02816791882994464,0.003:0.007368001305747207,0.01:4.6299664334060455E-4,0.055:6.358811991095171E-5,0.06:5.5240076581247115E-5,0.03:5.231350505975146E-5,0.073:4.4773553407989E-5,0:3.6810457882911134E-5,0.006913:3.469265861451538E-5}
>> 21557
>> {0.06:0.9993947051352264,0.05:2.53296407549323E-4,0.03:9.647254237540585E-5,0.04:4.3439270433017595E-5,0.006913:3.354681491698619E-5,0.046:2.8586727566416525E-5,0.007050:2.5584606074089293E-5,0.02:2.2477243875316502E-5,0:1.9194783598477564E-5,0.073:1.6900267481892075E-5}
>> 21558
>> {0.06:0.9985073788555696,0.03:8.033091100049725E-4,0.02:1.1953766739610202E-4,0.1:9.257610652171745E-5,0.006913:6.903783269939522E-5,0.04:5.7960967280609926E-5,0.07:5.5767050956214925E-5,0.046:4.857027337508731E-5,0.007050:3.5964741541364354E-5,0.057:3.538185773175377E-5}
>> 21559
>> {0.006913:0.5411158302162348,0.073:0.10595054605908563,0.04:0.10013413310674449,0.007050:0.08891690362990352,0.003:0.08028744789933502,0.03:0.07779866329563544,0.025:0.0024314950846438975,0.0625:0.0018111845757907532,0.077:5.019763638644379E-4,0.055:2.6705221435486376E-4}
>> 21560
>> {0.06:0.9978884964462115,0.03:9.139179655096763E-4,0.02:6.066525825847198E-4,0.04:3.2209695809936266E-4,0.006913:7.970816960964983E-5,0.007050:5.58984012266038E-5,0.046:1.545206468939752E-5,0.077:1.5319891034946036E-5,0:1.5223193391279898E-5,0.073:1.1687759018690983E-5}
>> 21561
>> {0.06:0.8704787441462888,0.007050:0.09131090501970876,0.0625:0.03778566273654969,0.046:8.111987924033587E-5,0.01:6.397000919080148E-5,0.1:4.979138875277178E-5,0.073:2.7942853854174412E-5,0.03:2.762357950066146E-5,0.077:2.6820742114533983E-5,0.025:2.6208487691114472E-5}
>> 21562
>> {0.06:0.9625603310705717,0.02:0.01708933716171641,0.007050:0.011374975845817934,0.07:0.00482114999912725,0.057:0.003310531318631415,0.077:2.568457462720504E-4,0.025:1.3439876933758153E-4,0.1:1.0512433283405881E-4,0:8.737443941062146E-5,0.046:5.9284232059258864E-5}
>> 21563
>> {0.06:0.9996809273982157,0.046:1.6325482120709976E-4,0.02:2.1084598024339765E-5,0.006913:1.7901242019979392E-5,0.04:1.3991162886383784E-5,0.03:1.3682157013015017E-5,0.077:1.3602492456590212E-5,0.007050:1.206712606141859E-5,0.1:1.0231842284246997E-5,0.057:7.704725064510759E-6}
>> 21564
>> {0.06:0.9997861963976675,0.03:2.7140899197995276E-5,0.077:2.4456158311423386E-5,0.04:1.7952818233846462E-5,0.02:1.6334455815684533E-5,0.006913:1.4910843270220926E-5,0.073:1.4893553937733922E-5,0.1:1.2545291899719683E-5,0.007050:1.2337617858874285E-5,0.07:1.121302251254011E-5}
>> 21565
>> {0.006913:0.5047832315246878,0.007050:0.2502505818382197,0.04:0.09937533960784072,0.03:0.05332716291468396,0.0625:0.035738976624857435,0.05:0.023139962103851885,0.1:0.01510786357969295,0.025:0.01294816540331917,0.06:0.003589347327961106,0.073:4.8607269023994543E-4}
>> 21566
>> {0.03:0.865579490292393,0.073:0.065013560785593,0.077:0.056622094108767465,0.046:0.006811842330071251,0.057:0.0021561477140846267,0.01:0.0013149375957061502,0.04:7.456782721333958E-4,0.05:6.412995789267404E-4,0:2.107021879325011E-4,0.007050:1.8121393989190674E-4}
>> 21567
>> {0.077:0.6528663315309344,0.03:0.2794089480653573,0.025:0.060214953606503134,0.003:0.0029851203917978303,0.01:0.002840784719750811,0.007050:5.508511345707982E-4,0.02:3.3395642786457786E-4,0.07:1.7452781529689483E-4,0.055:1.541344869853217E-4,0.046:1.0154945247629696E-4}
>> 21568
>> {0.057:0.7925855379581803,0.03:0.2036907633660934,0.06:0.002283924010657722,0.046:0.0011125177332923534,0.10:6.706442563331911E-5,0.02:3.9070227131596934E-5,0.07:3.266094677087569E-5,0.1:2.3436639268605713E-5,0.077:2.318067689954084E-5,0.006913:2.273173553155518E-5}
>> 21569
>> {0.06:0.7223525910216753,0.1:0.22280373045161775,0.04:0.04239924319412595,0.02:0.006529556700876843,0.007050:0.004661124794787862,0.10:2.017974034648702E-4,0.05:2.017516118028694E-4,0.025:1.583677755896652E-4,0.006913:1.1939663934259253E-4,0:8.736457986006156E-5}
>> 21570
>> {0.073:0.5321098304788365,0.006913:0.3897385574581158,0.02:0.051125703897749404,0.077:0.012417023254098358,0.01:0.006491518762040415,0.03:0.005434610908750246,0.055:9.352347087701305E-4,0.06:5.239363525283659E-4,0.10:4.4114444135088393E-4,0.003:2.2360302221231853E-4}
>> 21571
>> {0.06:0.9074255414695478,0.05:0.08971808812931319,0.02:0.0019124104766371694,0.1:2.581234320438502E-4,0.073:1.9041498001195312E-4,0.046:8.348355382104383E-5,0.006913:5.748062878632201E-5,0.057:4.8979196235823963E-5,0.04:4.4005272512340306E-5,0.03:4.1089625609562384E-5}
>> 21572
>> {0.06:0.7216888928389846,0.04:0.1955684645266304,0.006913:0.08235454447065854,0.1:9.82899034505965E-5,0.046:4.9184577303445956E-5,0.05:4.021965070167039E-5,0.007050:3.081280652634891E-5,0.073:2.791145099471127E-5,0.02:1.95681856192452E-5,0:1.9037190007395713E-5}
>> 21573
>> {0.05:0.8568971411565196,0.046:0.12909436895238377,0.06:0.012704234652048044,0.04:2.472784666357729E-4,0.055:2.1886698996361582E-4,0.1:1.1974451444757112E-4,0.0625:9.082357988309755E-5,0.07:9.030017229129562E-5,0.03:7.268045692763623E-5,0.073:6.607926928741721E-5}
>> 21574
>> {0.046:0.5619466458628039,0.006913:0.184782367089353,0.0625:0.09726566772972363,0.003:0.09534816862353344,0.02:0.015692927163565275,0.073:0.015492017672231727,0.01:0.01411218625979968,0.007050:0.01037341031640615,0.055:0.0020124341216292752,0:0.0013828685922332715}
>> 21575
>> {0.05:0.3167681189235041,0.06:0.2879333280436204,0.046:0.22584628506521745,0.003:0.15994241628395953,0.0625:0.006482994028630967,0.1:0.0016773104050919493,0.055:4.8725015996676173E-4,0.03:2.432893903382962E-4,0.025:1.6015914759364425E-4,0.006913:1.0021292427951807E-4}
>> 21576
>> {0.077:0.2407816981967022,0.003:0.18594668103110193,0.10:0.14951775492012523,0.0625:0.09241152906714677,0.007050:0.08997645163280943,0.057:0.07102865286733068,0.055:0.048046579920457584,0.05:0.03776387140040494,0.073:0.033355081179026046,0.006913:0.015980584385115525}
>> 21577
>> {0.06:0.9599074612361259,0.1:0.03694135499501186,0.04:0.0021941315048273186,0.057:4.7070027358666304E-4,0.02:1.8231606308803002E-4,0.0625:6.234942557920162E-5,0.006913:5.159011604129561E-5,0.046:3.9832812943124216E-5,0.007050:3.0380683715134534E-5,0.05:2.8920851352755496E-5}
>>
>>
>> *Streaming k-means:*
>>
>> [snip]
>> INFO: Number of Centroids: 0
>> Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
>> WARNING: job_local23982482_0001
>> java.lang.IllegalArgumentException: Must have nonzero number of training
>> and test vectors. Asked for %.1f %% of %d vectors for test
>> [10.000000149011612, 0]
>>         at
>> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
>>         at
>> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
>>         at
>> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
>>         at
>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
>>         at
>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
>>         at
>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
>>         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
>>         at
>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>>         at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
>>
>> [snip]
>>
>> WARNING: No qualcluster.props found on classpath, will use command-line
>> arguments only
>> Num clusters: 0; maxDistance: 0.000000
>> [Dunn Index] First: Infinity
>> [Davies-Bouldin Index] First: NaN
>> Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Program took 535 ms (Minutes: 0.008916666666666666)
>> cluster,distance.mean,distance.sd
>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>>
>>
>> On Tue, Jan 21, 2014 at 1:47 PM, Andrew Musselman <
>> andrew.musselman@gmail.com> wrote:
>>
>>> *classify-20newsgroups.sh*
>>>
>>> *Complementary naive bayes:*
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances          :      11207       98.9406%
>>> Incorrectly Classified Instances        :        120        1.0594%
>>> Total Classified Instances              :      11327
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a       b       c       d       e       f       g       h       i
>>> j       k       l       m       n       o       p       q       r       s
>>>    t        <--Classified as
>>> 475     0       0       1       0       0       0       0       0
>>> 0       0       0       0       0       1       0       1       0       0
>>>    0         |  478         a     = alt.atheism
>>> 0       597     1       1       0       1       1       0       0
>>> 0       0       1       0       2       1       0       0       0       0
>>>    0         |  605         b     = comp.graphics
>>> 0       1       620     3       0       1       0       0       0
>>> 0       0       1       0       0       1       0       0       0       0
>>>    0         |  627         c     = comp.os.ms-windows.misc
>>> 1       1       1       593     2       0       0       0       0
>>> 0       0       0       0       0       0       1       0       0       0
>>>    0         |  599         d     = comp.sys.ibm.pc.hardware
>>> 0       1       1       0       568     0       1       0       0
>>> 0       1       1       2       0       0       0       0       1       0
>>>    0         |  576         e     = comp.sys.mac.hardware
>>> 0       4       2       0       0       581     0       0       0
>>> 0       0       0       0       0       0       0       0       0       0
>>>    0         |  587         f     = comp.windows.x
>>> 0       0       0       1       2       0       571     3       0
>>> 0       1       1       4       1       0       0       0       0       0
>>>    0         |  584         g     = misc.forsale
>>> 0       0       0       1       0       0       0       589     1
>>> 0       0       1       1       0       0       0       0       0       0
>>>    0         |  593         h     = rec.autos
>>> 0       0       0       0       0       0       0       1       565
>>> 0       0       0       0       0       1       0       0       0       0
>>>    0         |  567         i     = rec.motorcycles
>>> 0       0       0       0       0       0       0       0       0
>>> 600     2       0       0       0       1       0       0       0       0
>>>    0         |  603         j     = rec.sport.baseball
>>> 0       0       0       0       0       0       0       0       0
>>> 1       584     0       0       0       0       0       0       0       0
>>>    0         |  585         k     = rec.sport.hockey
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       579     0       0       0       0       0       1       0
>>>    0         |  580         l     = sci.crypt
>>> 0       0       0       1       3       0       2       0       0
>>> 2       0       0       567     1       2       1       0       0       0
>>>    0         |  579         m     = sci.electronics
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       0       1       605     0       0       0       0       0
>>>    0         |  606         n     = sci.med
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       0       0       0       602     0       0       0       0
>>>    0         |  602         o     = sci.space
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       0       0       1       0       602     0       0       1
>>>    0         |  604         p     = soc.religion.christian
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       0       0       0       0       0       556     0       0
>>>    0         |  556         q     = talk.politics.mideast
>>> 0       0       1       0       0       0       0       0       0
>>> 0       0       1       0       0       1       0       0       568     0
>>>    0         |  571         r     = talk.politics.guns
>>> 11      0       0       0       0       0       0       0       0
>>> 1       0       0       0       1       3       8       1       4       338
>>>    2         |  369         s     = talk.religion.misc
>>> 0       0       0       0       0       0       0       0       0
>>> 0       1       0       0       0       1       0       3       4       0
>>>    447       |  456         t     = talk.politics.misc
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa                                       0.9806
>>> Accuracy                                   98.9406%
>>> Reliability                                94.0932%
>>> Reliability (standard deviation)            0.2163
>>>
>>> Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 15870 ms (Minutes: 0.2645)
>>> + echo 'Testing on holdout set'
>>> Testing on holdout set
>>> + ./bin/mahout testnb -i /tmp/mahout-work-ec2-user/20news-test-vectors
>>> -m /tmp/mahout-work-ec2-user/model -l /tmp/mahout-work-ec2-user/labelindex
>>> -ow -o /tmp/mahout-work-ec2-user/20news-testing -c
>>>
>>> [snip]
>>>
>>> INFO: Complementary Results:
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances          :       6715       89.3071%
>>> Incorrectly Classified Instances        :        804       10.6929%
>>> Total Classified Instances              :       7519
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a       b       c       d       e       f       g       h       i
>>> j       k       l       m       n       o       p       q       r       s
>>>    t        <--Classified as
>>> 298     0       0       0       0       0       0       0       0
>>> 1       0       0       0       1       2       5       1       0       13
>>>     0         |  321         a     = alt.atheism
>>> 0       298     11      6       1       12      2       2       1
>>> 1       3       8       3       4       2       4       1       4       4
>>>    1         |  368         b     = comp.graphics
>>> 1       17      286     16      4       9       6       3       2
>>> 0       1       0       1       7       1       0       2       1       0
>>>    1         |  358         c     = comp.os.ms-windows.misc
>>> 2       6       11      309     9       5       14      8       1
>>> 0       2       0       6       4       2       0       1       2       1
>>>    0         |  383         d     = comp.sys.ibm.pc.hardware
>>> 0       10      8       7       334     7       5       5       2
>>> 0       3       0       2       1       1       0       1       1       0
>>>    0         |  387         e     = comp.sys.mac.hardware
>>> 1       13      7       8       2       355     2       0       2
>>> 0       0       5       1       1       3       0       0       1       0
>>>    0         |  401         f     = comp.windows.x
>>> 0       7       11      29      12      9       268     16      8
>>> 4       3       2       6       4       2       1       3       1       2
>>>    3         |  391         g     = misc.forsale
>>> 0       1       0       0       3       0       7       362     8
>>> 2       2       1       2       0       2       0       1       2       0
>>>    4         |  397         h     = rec.autos
>>> 0       0       0       1       0       0       1       0       423
>>> 0       0       0       2       1       0       1       0       0       0
>>>    0         |  429         i     = rec.motorcycles
>>> 0       0       1       0       0       0       0       2       2
>>> 371     8       0       2       3       0       2       0       0       0
>>>    0         |  391         j     = rec.sport.baseball
>>> 0       0       1       0       0       0       1       0       0
>>> 2       409     0       0       0       0       0       0       0       0
>>>    1         |  414         k     = rec.sport.hockey
>>> 0       0       1       2       1       0       1       0       0
>>> 0       0       404     0       0       0       0       0       1       0
>>>    1         |  411         l     = sci.crypt
>>> 0       5       4       11      1       3       7       9       2
>>> 5       3       3       339     2       6       0       1       1       2
>>>    1         |  405         m     = sci.electronics
>>> 0       4       0       1       0       0       0       1       0
>>> 1       1       0       3       367     3       1       2       0       0
>>>    0         |  384         n     = sci.med
>>> 0       1       2       0       1       0       2       0       0
>>> 1       0       0       1       1       375     0       1       0       0
>>>    0         |  385         o     = sci.space
>>> 4       2       1       1       0       0       1       1       2
>>> 0       0       1       1       5       1       367     4       0       1
>>>    1         |  393         p     = soc.religion.christian
>>> 0       1       0       0       0       0       0       0       0
>>> 2       0       0       0       0       0       2       378     0       1
>>>    0         |  384         q     = talk.politics.mideast
>>> 0       0       0       0       0       2       1       1       1
>>> 1       0       3       0       3       0       0       2       319     2
>>>    4         |  339         r     = talk.politics.guns
>>> 32      0       0       1       0       0       0       0       0
>>> 1       1       1       0       2       2       26      5       7       175
>>>    6         |  259         s     = talk.religion.misc
>>> 0       0       0       2       0       0       0       0       0
>>> 1       2       2       0       1       2       1       10      18      2
>>>    278       |  319         t     = talk.politics.misc
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa                                       0.8594
>>> Accuracy                                   89.3071%
>>> Reliability                                 84.611%
>>> Reliability (standard deviation)            0.2148
>>>
>>> Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 10840 ms (Minutes: 0.18066666666666667)
>>>
>>>
>>> *Naive bayes:*
>>> INFO: Standard NB Results:
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances          :      11286       99.0869%
>>> Incorrectly Classified Instances        :        104        0.9131%
>>> Total Classified Instances              :      11390
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a       b       c       d       e       f       g       h       i
>>> j       k       l       m       n       o       p       q       r       s
>>>    t        <--Classified as
>>> 474     0       0       0       0       0       0       0       0
>>> 0       0       0       0       0       0       0       0       0       2
>>>    1         |  477         a     = alt.atheism
>>> 0       566     0       2       0       1       0       0       0
>>> 0       0       0       0       0       0       0       0       0       0
>>>    0         |  569         b     = comp.graphics
>>> 0       10      590     29      2       4       1       0       0
>>> 0       0       0       1       0       0       0       0       0       0
>>>    1         |  638         c     = comp.os.ms-windows.misc
>>> 0       0       0       596     0       0       0       0       0
>>> 0       0       0       0       0       0       0       0       0       0
>>>    0         |  596         d     = comp.sys.ibm.pc.hardware
>>> 0       0       0       0       575     0       1       0       0
>>> 0       0       0       1       0       0       0       0       0       0
>>>    0         |  577         e     = comp.sys.mac.hardware
>>> 0       2       2       2       0       593     1       0       0
>>> 0       0       0       0       0       1       0       0       0       0
>>>    0         |  601         f     = comp.windows.x
>>> 0       0       0       1       0       0       589     1       0
>>> 0       1       0       2       0       0       0       0       0       0
>>>    0         |  594         g     = misc.forsale
>>> 0       0       0       0       0       0       0       594     0
>>> 0       0       0       0       0       0       0       0       0       0
>>>    0         |  594         h     = rec.autos
>>> 0       0       0       0       0       0       0       0       611
>>> 0       0       0       0       0       0       0       0       0       0
>>>    0         |  611         i     = rec.motorcycles
>>> 0       0       0       0       0       0       0       0       0
>>> 616     1       0       0       0       0       0       0       0       0
>>>    0         |  617         j     = rec.sport.baseball
>>> 0       0       0       0       0       0       1       0       0
>>> 0       620     0       0       0       0       0       0       0       0
>>>    0         |  621         k     = rec.sport.hockey
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       580     0       0       0       0       0       1       0
>>>    0         |  581         l     = sci.crypt
>>> 0       0       0       3       1       0       0       0       0
>>> 0       0       0       571     0       0       0       0       0       0
>>>    0         |  575         m     = sci.electronics
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       0       2       583     0       0       0       0       0
>>>    0         |  585         n     = sci.med
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       0       0       1       599     0       0       0       0
>>>    0         |  600         o     = sci.space
>>> 0       1       0       0       0       0       0       0       0
>>> 0       0       0       0       0       0       615     0       0       0
>>>    0         |  616         p     = soc.religion.christian
>>> 1       0       0       0       0       0       0       0       0
>>> 0       0       0       0       0       0       1       560     0       0
>>>    0         |  562         q     = talk.politics.mideast
>>> 0       0       1       0       0       0       0       0       0
>>> 0       0       1       0       0       0       0       0       548     0
>>>    1         |  551         r     = talk.politics.guns
>>> 10      0       0       0       0       0       0       0       0
>>> 0       0       0       0       0       1       1       0       2       344
>>>    1         |  359         s     = talk.religion.misc
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       1       1       0       0       0       0       2       0
>>>    462       |  466         t     = talk.politics.misc
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa                                       0.9847
>>>  Accuracy                                   99.0869%
>>> Reliability                                94.3334%
>>> Reliability (standard deviation)            0.2169
>>>
>>> Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 14304 ms (Minutes: 0.2384)
>>> + echo 'Testing on holdout set'
>>> Testing on holdout set
>>>
>>> [snip]
>>>
>>> INFO: Standard NB Results:
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances          :       6718       90.1019%
>>> Incorrectly Classified Instances        :        738        9.8981%
>>> Total Classified Instances              :       7456
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a       b       c       d       e       f       g       h       i
>>> j       k       l       m       n       o       p       q       r       s
>>>    t        <--Classified as
>>> 294     0       0       0       0       0       0       0       0
>>> 0       0       2       0       1       1       6       1       1       16
>>>     0         |  322         a     = alt.atheism
>>> 0       345     6       14      6       11      6       0       0
>>> 0       0       5       7       1       3       0       0       0       0
>>>    0         |  404         b     = comp.graphics
>>> 2       29      177     78      22      19      9       1       0
>>> 0       0       4       2       0       1       1       0       0       1
>>>    1         |  347         c     = comp.os.ms-windows.misc
>>> 1       9       2       335     18      2       10      0       0
>>> 0       1       0       8       0       0       0       0       0       0
>>>    0         |  386         d     = comp.sys.ibm.pc.hardware
>>> 1       4       2       13      347     3       5       1       0
>>> 0       1       0       7       1       0       0       0       1       0
>>>    0         |  386         e     = comp.sys.mac.hardware
>>> 0       20      0       4       0       352     4       0       0
>>> 0       0       0       1       1       3       0       1       0       1
>>>    0         |  387         f     = comp.windows.x
>>> 0       2       0       21      5       1       323     7       2
>>> 2       0       2       12      0       3       0       0       0       0
>>>    1         |  381         g     = misc.forsale
>>> 0       1       0       0       1       0       15      363     8
>>> 1       0       0       4       1       0       0       0       1       0
>>>    1         |  396         h     = rec.autos
>>> 0       1       0       0       0       0       6       6       370
>>> 0       0       0       0       1       0       0       0       0       1
>>>    0         |  385         i     = rec.motorcycles
>>> 1       0       0       1       1       0       2       1       2
>>> 362     5       0       2       0       0       0       0       0       0
>>>    0         |  377         j     = rec.sport.baseball
>>> 0       0       0       1       2       0       0       0       0
>>> 3       371     0       0       0       0       0       0       0       0
>>>    1         |  378         k     = rec.sport.hockey
>>> 0       3       1       0       1       0       2       0       0
>>> 0       0       396     0       1       0       0       1       1       1
>>>    3         |  410         l     = sci.crypt
>>> 0       7       0       7       7       2       6       4       0
>>> 0       0       1       369     2       2       0       0       0       0
>>>    2         |  409         m     = sci.electronics
>>> 0       3       0       2       1       0       2       0       0
>>> 0       0       1       4       383     4       0       0       1       0
>>>    4         |  405         n     = sci.med
>>> 0       5       0       0       1       0       3       0       0
>>> 0       0       0       1       0       374     1       0       0       1
>>>    1         |  387         o     = sci.space
>>> 6       2       0       1       1       0       0       1       0
>>> 1       0       0       1       5       0       352     2       1       7
>>>    1         |  381         p     = soc.religion.christian
>>> 1       1       0       0       0       0       0       0       0
>>> 0       1       0       0       0       0       0       373     1       0
>>>    1         |  378         q     = talk.politics.mideast
>>> 0       0       0       0       0       0       1       0       1
>>> 0       0       2       0       0       0       0       0       346     2
>>>    7         |  359         r     = talk.politics.guns
>>> 26      1       0       1       0       0       0       2       0
>>> 1       1       0       0       1       1       20      2       6       200
>>>    7         |  269         s     = talk.religion.misc
>>> 1       0       0       0       0       0       0       2       0
>>> 0       1       0       0       2       2       0       1       14      0
>>>    286       |  309         t     = talk.politics.misc
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa                                       0.8726
>>> Accuracy                                   90.1019%
>>> Reliability                                85.4491%
>>> Reliability (standard deviation)            0.2222
>>>
>>> Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 10878 ms (Minutes: 0.1813)
>>>
>>> *SGD:*
>>> 7532 test files
>>>
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances          :       5649            75%
>>> Incorrectly Classified Instances        :       1883            25%
>>> Total Classified Instances              :       7532
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a       b       c       d       e       f       g       h       i
>>> j       k       l       m       n       o       p       q       r       s
>>>    t        <--Classified as
>>> 186     6       3       10      5       0       33      4       13
>>>  15      7       1       24      15      3       15      5       5       29
>>>     15        |  394         a     = sci.space
>>> 5       309     0       3       2       5       0       0       0
>>> 1       9       21      2       0       0       18      4       4       1
>>>    1         |  385         b     = comp.sys.mac.hardware
>>> 4       1       101     3       0       1       63      0       7
>>> 0       1       1       5       16      3       0       3       7       1
>>>    34        |  251         c     = talk.religion.misc
>>> 11      12      1       265     1       10      3       0       0
>>> 17      10      11      5       2       0       11      3       6       21
>>>     0         |  389         d     = comp.graphics
>>> 2       1       1       0       349     2       3       0       3
>>> 2       6       1       5       1       0       2       15      2       1
>>>    2         |  398         e     = rec.motorcycles
>>> 7       20      3       19      2       254     6       0       2
>>> 11      2       39      7       2       0       4       2       2       9
>>>    3         |  394         f     = comp.os.ms-windows.misc
>>> 2       1       13      0       0       0       247     0       1
>>> 1       3       0       6       2       4       0       2       3       5
>>>    29        |  319         g     = alt.atheism
>>> 1       1       0       0       2       0       2       361     0
>>> 1       2       0       2       0       0       1       3       22      0
>>>    1         |  399         h     = rec.sport.hockey
>>> 3       0       3       1       0       0       5       0       161
>>> 0       1       2       12      102     0       0       1       2       11
>>>     6         |  310         i     = talk.politics.misc
>>> 2       8       0       19      0       19      0       0       1
>>> 294     10      11      4       2       0       5       0       3       11
>>>     6         |  395         j     = comp.windows.x
>>> 2       10      0       1       1       0       0       0       0
>>> 1       347     13      2       1       0       5       3       2       2
>>>    0         |  390         k     = misc.forsale
>>> 1       36      0       6       1       25      0       0       1
>>> 6       10      257     2       1       0       34      6       0       6
>>>    0         |  392         l     = comp.sys.ibm.pc.hardware
>>> 2       2       2       2       1       0       12      0       0
>>> 6       10      4       312     5       2       13      11      3       3
>>>    6         |  396         m     = sci.med
>>> 2       0       3       2       1       0       0       1       13
>>>  0       5       1       2       314     2       0       2       2       10
>>>     4         |  364         n     = talk.politics.guns
>>> 1       0       2       1       1       0       34      1       33
>>>  1       3       0       1       8       271     1       4       5       6
>>>      3         |  376         o     = talk.politics.mideast
>>> 3       14      0       8       2       8       3       1       1
>>> 7       12      29      6       2       1       245     13      2       32
>>>     4         |  393         p     = sci.electronics
>>> 3       3       0       2       11      0       1       0       2
>>> 1       11      6       4       2       0       11      330     4       4
>>>    1         |  396         q     = rec.autos
>>> 0       0       1       0       1       0       4       12      3
>>> 1       3       0       0       0       0       5       6       359     1
>>>    1         |  397         r     = rec.sport.baseball
>>> 0       1       0       0       0       1       0       0       3
>>> 3       0       0       3       2       1       6       1       6       366
>>>    3         |  396         s     = sci.crypt
>>> 0       2       11      1       1       0       40      0       1
>>> 2       3       4       2       1       0       5       0       2       2
>>>    321       |  398         t     = soc.religion.christian
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa                                       0.7073
>>> Accuracy                                        75%
>>> Reliability                                70.6238%
>>> Reliability (standard deviation)            0.2187
>>> Log-likelihood                mean      :    -1.1182
>>>                               25%-ile   :    -1.6911
>>>                               75%-ile   :    -0.0803
>>>
>>> Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 10783 ms (Minutes: 0.17971666666666666)
>>>
>>>
>>>
>>>
>>> On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <su...@yahoo.com>wrote:
>>>
>>>> Thanks Andrew for reporting that. I rolled back the release to fix this
>>>> and few other issues.
>>>>
>>>> We have removed asf-examples*.sh from trunk as the sample file at the
>>>> url mentioned in ur email is not available.
>>>> This is something we need to fix and restore in 1.0.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <
>>>> ap.dev@outlook.com> wrote:
>>>>
>>>> from the asf-email-examples.sh script:
>>>>
>>>> # You will need to download or otherwise obtain some or all of the
>>>> Amazon ASF Em
>>>> ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566)
>>>> to use this
>>>> script.
>>>> # To obtain a full copy you will need to launch an EC2 instance and
>>>> mount the da
>>>> taset to download it, otherwise you can get a sample of it at
>>>> #
>>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>>>>
>>>> It looks like the:
>>>>
>>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>>>>
>>>> link is down.
>>>>
>>>> Is there somewhere else that we can get a subset of the ASF emails?
>>>>
>>>>
>>>>
>>>> Date: Tue, 21 Jan 2014 09:48:06 -0800
>>>> > Subject: Re: MAHOUT 0.9 Release - New URL
>>>> > From: andrew.musselman@gmail.com
>>>> > To: dev@mahout.apache.org
>>>> >
>>>> > Sure thing; continuing to smoke test the other examples tonight
>>>> >
>>>> >
>>>> > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <
>>>> suneel_marthi@yahoo.com>wrote:
>>>> >
>>>> > > Thanks Andrew M., see that some of the example scripts need to be
>>>> fixed as
>>>> > > they still refer to the deprecated algorithms.
>>>> > > See that the Streaming KMeans has failed for you as well.
>>>> > >
>>>> > > I'll be rolling back the release today to fix these issues.
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
>>>> > > andrew.musselman@gmail.com> wrote:
>>>> > >
>>>> > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default
>>>> 64-bit
>>>> > > Linux AMI from tarball.
>>>> > >
>>>> > > All tests pass.
>>>> > >
>>>> > > *Output of examples:*
>>>> > > *asf-email-examples.sh, run on mahout.apache.org
>>>> > > <http://mahout.apache.org>:*
>>>> > > *recommendations:*
>>>> > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
>>>> > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000  | less
>>>> > > 1
>>>> > >
>>>> > >
>>>> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
>>>> > > 4
>>>> > >
>>>> > >
>>>> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
>>>> > > 6
>>>> > >
>>>> > >
>>>> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
>>>> > > 8
>>>> > >     [12758:1.0,19409:1.0,11112:1.0]
>>>> > > 11
>>>> > >
>>>> > >
>>>> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
>>>> > > 14
>>>> > >
>>>> > >
>>>> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
>>>> > > 15
>>>> > >
>>>> > >
>>>> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
>>>> > > 16
>>>> > >
>>>> > >
>>>> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
>>>> > > 18
>>>> > >
>>>> > >
>>>> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
>>>> > > 19      [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
>>>> > > 20
>>>> > >
>>>> > >
>>>> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
>>>> > > [snip]
>>>> > >
>>>> > > *clustering; kmeans:*
>>>> > > [snip]
>>>> > >         Weight : [props - optional]:  Point:
>>>> > >         1.0 :
>>>> > >  [distance-squared=1.0193102046188427]:
>>>> > > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus=
>>>> > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
>>>> 7573:0.204,
>>>> > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093,
>>>> 9779:0.159,
>>>> > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
>>>> > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
>>>> > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
>>>> > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
>>>> > > 39789:0.110, 40743:0.190, 45775:0.086]
>>>> > >         1.0 : [distance-squared=0.9823018320457279]:
>>>> > > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus=
>>>> > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
>>>> 5336:0.106,
>>>> > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173,
>>>> 7832:0.072,
>>>> > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
>>>> > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
>>>> > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
>>>> > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
>>>> > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
>>>> > >         1.0 : [distance-squared=0.9509142993214911]:
>>>> > >
>>>> /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
>>>> > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
>>>> > >  4419:0.076,
>>>> > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056,
>>>> 7235:0.048,
>>>> > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123,
>>>> 7683:0.077,
>>>> > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
>>>> 10225:0.081,
>>>> > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
>>>> > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
>>>> > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
>>>> > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
>>>> > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
>>>> > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
>>>> > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
>>>> > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
>>>> > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
>>>> > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
>>>> > > 41280:0.065, 41696:0.072, 41947:0.118,
>>>> > >  43685:0.086, 44077:0.308,
>>>> > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
>>>> > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
>>>> > > [snip]
>>>> > >
>>>> > > *clustering; dirichlet:*
>>>> > > Get this complaint:
>>>> > > Running Dirichlet with K = 8
>>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
>>>> > > HADOOP_CONF_DIR=
>>>> > > MAHOUT-JOB:
>>>> > >
>>>> > >
>>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class:
>>>> dirichlet
>>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props
>>>> found on
>>>> > > classpath, will use command-line arguments only
>>>> > > Unknown program 'dirichlet' chosen.
>>>> > >
>>>> > > *clustering: minhash:*
>>>> > > Running Minhash
>>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
>>>> > > HADOOP_CONF_DIR=
>>>> > > MAHOUT-JOB:
>>>> > >
>>>> > >
>>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>>>> > > 14/01/21 05:17:27 WARN
>>>> > >  driver.MahoutDriver: Unable to add class: minhash
>>>> > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found
>>>> on
>>>> > > classpath, will use command-line arguments only
>>>> > > Unknown program 'minhash' chosen.
>>>> > >
>>>> > > *classification; standard:*
>>>> > > =======================================================
>>>> > > Summary
>>>> > > -------------------------------------------------------
>>>> > > Correctly Classified Instances          :       5384       87.7874%
>>>> > > Incorrectly Classified Instances        :        749       12.2126%
>>>> > > Total Classified Instances              :       6133
>>>> > >
>>>> > > =======================================================
>>>> > > Confusion Matrix
>>>> > > -------------------------------------------------------
>>>> > > a       b       c       d
>>>> > >     <--Classified as
>>>> > > 2949    7       531     25       |  3512        a     = dev
>>>> > > 0       0       0       0        |  0           b     = general
>>>> > > 99      8       1763    8        |  1878        c     = user
>>>> > > 41      1       29      672      |  743         d     = commits
>>>> > >
>>>> > > =======================================================
>>>> > > Statistics
>>>> > > -------------------------------------------------------
>>>> > > Kappa
>>>> > >  0.7877
>>>> > > Accuracy                                   87.7874%
>>>> > > Reliability                                 53.658%
>>>> > > Reliability (standard deviation)            0.4911
>>>> > >
>>>> > > *classification; complementary:*
>>>> > > =======================================================
>>>> > > Summary
>>>> > > -------------------------------------------------------
>>>> > > Correctly Classified Instances          :       5530       90.1679%
>>>> > > Incorrectly Classified Instances        :        603        9.8321%
>>>> > > Total Classified Instances              :
>>>> > >  6133
>>>> > >
>>>> > > =======================================================
>>>> > > Confusion Matrix
>>>> > > -------------------------------------------------------
>>>> > > a       b       c       d       <--Classified as
>>>> > > 3168    0       276     68       |  3512        a     = dev
>>>> > > 0       0       0       0        |  0           b     = general
>>>> > > 196     0       1652    30       |  1878        c     = user
>>>> > > 25      0       8       710      |  743         d     =
>>>> > >  commits
>>>> > >
>>>> > > =======================================================
>>>> > > Statistics
>>>> > > -------------------------------------------------------
>>>> > > Kappa                                       0.8259
>>>> > > Accuracy                                   90.1679%
>>>> > > Reliability                                54.7459%
>>>> > > Reliability (standard deviation)            0.5005
>>>> > >
>>>> > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms
>>>> (Minutes:
>>>> > > 0.34836666666666666)
>>>> > >
>>>> > > *classification; sgd, with three categories:*
>>>> > > Running SGD Training
>>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
>>>> > >  and
>>>> > > HADOOP_CONF_DIR=
>>>> > > MAHOUT-JOB:
>>>> > >
>>>> > >
>>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>>>> > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
>>>> > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
>>>> classpath,
>>>> > > will use command-line arguments only
>>>> > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
>>>> > > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
>>>> > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
>>>> > > --output=[asf-output/classification/sgd/models], --poolSize=[5],
>>>> > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
>>>> > > 24168 training files
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       1
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>>>> > >  2
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       3
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       4
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       6
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       8
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       10
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00
>>>> > >    0.00    0.00    0.0000000       0.0000000       12
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       15
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       20
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       25
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       30
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000
>>>> > >     0.0000000       40
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       50
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       60
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       70
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       80
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       100
>>>> > > 0.000
>>>> > >  0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       120
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       140
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       150
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       200
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       250
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00
>>>> > >  0.00    0.00    0.0000000       0.0000000       300
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       400
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       500
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       600
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       700
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000
>>>> > >  0.0000000       800
>>>> > > 0.000   0.00    none
>>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
>>>> > > 1.0019413e-08   1000    -0.607  75.78   none
>>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
>>>> > > 1.0019413e-08   1200    -0.607  75.78   none
>>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
>>>> > > 1.0019413e-08   1400    -0.607  75.78   none
>>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
>>>> > > 1.0019413e-08   1500    -0.607  75.78   none
>>>> > > 0.24    43686.00        17924.00        329.50
>>>> > >  1.0571799e-08
>>>> > > 1.0032261e-08   2000    -0.487  82.65   none
>>>> > > 0.24    49753.00        21610.00        330.71  1.3770070e-08
>>>> > > 1.0011902e-08   2500    -0.439  83.90   none
>>>> > > 0.24    49753.00        21610.00        330.71  1.3770070e-08
>>>> > > 1.0011902e-08   3000    -0.439  83.90   none
>>>> > > 0.32    50635.00        28531.00        437.09  1.0551175e-08
>>>> > > 1.0000001e-08   4000    -0.351  88.14   none
>>>> > > 0.32    50635.00        32642.00        437.09  1.0551175e-08
>>>> > > 1.0000000e-08   5000    -0.378  87.10   none
>>>> > > 0.32    50635.00        36461.00        437.09
>>>> > >  1.0556652e-08
>>>> > > 1.0000001e-08   6000    -0.372  86.89   none
>>>> > > 0.32    50635.00        37768.00        437.09  1.0576742e-08
>>>> > > 1.0000001e-08   7000    -0.334  89.26   none
>>>> > > 0.32    50635.00        38807.00        437.09  1.0576742e-08
>>>> > > 1.0000000e-08   8000    -0.368  87.52   none
>>>> > > 0.32    50635.00        44731.00        437.09  1.0576716e-08
>>>> > > 1.0000000e-08   10000   -0.374  87.39   none
>>>> > > 0.32    50635.00        45672.00        437.09  1.0576716e-08
>>>> > > 1.0000000e-08   12000   -0.298  88.26   none
>>>> > > Exception in thread "main" java.lang.IllegalStateException:
>>>> > > java.lang.ArrayIndexOutOfBoundsException:
>>>> > >  2
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
>>>> > >         at
>>>> > >
>>>> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
>>>> > >         at
>>>> > >
>>>> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
>>>> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>> Method)
>>>> > >         at
>>>> > >
>>>> > >
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>> > >
>>>> > >  at
>>>> > >
>>>> > >
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> > >         at java.lang.reflect.Method.invoke(Method.java:622)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>>> > >         at
>>>> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>> > >         at
>>>> > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>>>> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>> Method)
>>>> > >         at
>>>> > >
>>>> > >
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>> > >         at
>>>> > >
>>>> > >
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> > >         at java.lang.reflect.Method.invoke(Method.java:622)
>>>> > >         at
>>>> > >  org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>>>> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
>>>> > >         at
>>>> > > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
>>>> > >
>>>> > >  at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
>>>> > >         at
>>>> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>> > >         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>> > >         at
>>>> > >
>>>> > >
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>>>> > >         at
>>>> > >
>>>> > >
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> > >         at java.lang.Thread.run(Thread.java:701)
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
>>>> > > andrew.musselman@gmail.com> wrote:
>>>> > >
>>>> > > > Trying out the build today
>>>> > > >
>>>> > > >
>>>> > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
>>>> suneel_marthi@yahoo.com
>>>> > > >wrote:
>>>> > > >
>>>> > > >> This is an issue (trivial one though) that needs to be fixed for
>>>> 0.9
>>>> > > >> Release, will be rerolling the release today (in the next few
>>>> hrs) and
>>>> > > >> putting out a new release candidate in staging.
>>>> > > >>
>>>> > > >> Thanks for reporting this Andrew P.
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
>>>> > > ap.dev@outlook.com>
>>>> > > >> wrote:
>>>> > > >>
>>>> > > >> I ran through the tests with on a CentOS VM
>>>> > >  AMD64 2 cores 4 GB RAM.  Had
>>>> > > >> a bit of trouble getting the Hadoop natives to compile and
>>>> therefore may
>>>> > > >> have run into some problems because of the hadoop setup.  Ran
>>>> into some
>>>> > > >> problems in the example scripts.  Particularly with
>>>> > > >> ./cluster-syntheticcontrol.sh ->4,5.  I will run through the
>>>> rest of the
>>>> > > >> examples when im sure I've got hadoop setup right.
>>>> > > >>
>>>> > > >>
>>>> > > >> Apache Maven 3.1.2-SNAPSHOT
>>>> > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
>>>> > > >> Java home: /usr/java/jdk1.6.0_45/jre
>>>> > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch:
>>>> "amd64",
>>>> > > >> family: "unix"
>>>> > > >> $MAHOUT_LOCAL=true
>>>> > > >> Hadoop 2.2.0
>>>> > > >>
>>>> > > >>
>>>> > > >> a) Verify that u can unpack the release (tar or zip) ...passed
>>>> (tar)
>>>> > > >> [passed ]
>>>> > > >>
>>>> > > >> b) Verify u r able to compile the
>>>> > >  distro
>>>> > > >>
>>>> > > >>     mvn compile- [passed with warnings]
>>>> > > >>
>>>> > > >>     [WARNING]  Expected all dependencies to require Scala
>>>> version: 2.9.3
>>>> > > >>     [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires
>>>> scala
>>>> > > >> version: 2.9.3
>>>> > > >>     [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala
>>>> > > >> version: 2.9.2
>>>> > > >>     [WARNING] Multiple versions of scala libraries detected!
>>>> > > >>
>>>> > > >> c)  Run through the unit tests: mvn clean test
>>>> > > >>     mvn clean test [passed]
>>>> > > >>
>>>> > > >> d) Run the
>>>> > > >>  example scripts under $MAHOUT_HOME/examples/bin.
>>>> > > >> Please run through all the different options in each script
>>>> > > >>
>>>> > > >>     Running example scripts with $MAHOUT_LOCAL=true
>>>> > > >>
>>>> > > >>
>>>> > >  ./cluster-syntheticcontrol.sh ->1 [works]
>>>> > > >>     ./cluster-syntheticcontrol.sh ->2 [works]
>>>> > > >>     ./cluster-syntheticcontrol.sh ->3 [works]
>>>> > > >>
>>>> > > >>
>>>> > > >>     ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
>>>> > > >>     [...]
>>>> > > >>     WARNING: Unable to add class:
>>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>>>> > > >>     java.lang.ClassNotFoundException:
>>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>>>> > > >>         at
>>>> > > >>  java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>> > > >>         at java.security.AccessController.doPrivileged(Native
>>>> Method)
>>>> > > >>         at
>>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>> > > >>         at
>>>> > >  java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>>> > > >>         at
>>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>>> > > >>         at java.lang.Class.forName0(Native Method)
>>>> > > >>         at java.lang.Class.forName(Class.java:171)
>>>> > > >>         at
>>>> > > >>
>>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>>>> > > >>         at
>>>> > > >>
>>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>>>> > > >>     Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
>>>> > > >>
>>>> > > >>
>>>> > > >>     ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
>>>> > > >>
>>>> > > >>     WARNING: Unable to add class:
>>>> > > >>
>>>> > >  org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>>>> > > >>     java.lang.ClassNotFoundException:
>>>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>>>> > > >>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>> > > >>         at java.security.AccessController.doPrivileged(Native
>>>> Method)
>>>> > > >>         at
>>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>>> > > >>         at
>>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>>> > > >>         at java.lang.Class.forName0(Native Method)
>>>> > > >>         at
>>>> > >  java.lang.Class.forName(Class.java:171)
>>>> > > >>         at
>>>> > > >>
>>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>>>> > > >>         at
>>>> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>>>> > > >>     Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
>>>> > > >>     WARNING: No
>>>> > > >>
>>>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found
>>>> > > on
>>>> > > >> classpath, will use command-line arguments only
>>>> > > >>     Unknown program
>>>> > > >>  'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
>>>> chosen.
>>>> > > >>
>>>> > > >>
>>>> > > >>     ./classify-20newsgroups.sh ->1 [works]
>>>> > > >>     ./classify-20newsgroups.sh ->2 [works]
>>>> > > >>
>>>> > > >>
>>>> > > >>     cluster-reuters.sh ->1 [works]
>>>> > > >>
>>>> > >  cluster-reuters.sh ->2 [works]
>>>> > > >>     cluster-reuters.sh ->3 [works]
>>>> > > >>
>>>> > > >>     Same error as noted previosly in the thread:
>>>> > > >>
>>>> > > >>     cluster-reuters.sh ->4 [0 clusters]
>>>> > > >>
>>>> > > >>     [...]
>>>> > > >>
>>>> > > >>     WARNING: No qualcluster.props found on classpath, will use
>>>> > > >> command-line arguments only
>>>> > > >>     Num clusters: 0; maxDistance: 0.000000
>>>> > > >>     [Dunn Index]
>>>> > > >>  First: Infinity
>>>> > > >>     [Davies-Bouldin Index] First: NaN
>>>> > > >>     Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
>>>> > > >>     INFO: Program took 669 ms (Minutes: 0.01115)
>>>> > > >>     cluster,distance.mean,distance.sd
>>>> > > >>
>>>> > >
>>>> > >
>>>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
>>>> > > >> > From: suneel_marthi@yahoo.com
>>>> > > >> > Subject: MAHOUT 0.9 Release - New URL
>>>> > > >> > To: user@mahout.apache.org; dev@mahout.apache.org
>>>> > > >> >
>>>> > > >> > Third time's a Charm!!!
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > Here's the new URL for Mahout 0.9 Release:
>>>> > > >> >
>>>> > > >>
>>>> > >
>>>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>>>> > > >> >
>>>> > > >> > For those volunteering to test this, some of the things to be
>>>> > > verified:
>>>> > > >> >
>>>> > > >> > a) Verify that u can unpack the release (tar or zip)
>>>> > > >> > b) Verify u r able to compile the distro
>>>> > > >> > c)  Run through the unit tests: mvn clean test
>>>> > > >> > d) Run the example scripts
>>>> > > >>  under $MAHOUT_HOME/examples/bin. Please run through all the
>>>> different
>>>> > > >> options in each script.
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > Committers
>>>> > > >> >  and PMC members:
>>>> > > >> > ---------------------------------------
>>>> > > >> >
>>>> > > >> > Need 'at least 3 +1 votes' for the Release to pass.
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > Thanks and
>>>> > >  Regards.
>>>> > > >>
>>>> > > >
>>>> > > >
>>>> > >
>>>>
>>>
>>>
>>
>

Re: MAHOUT 0.9 Release - New URL

Posted by Andrew Musselman <an...@gmail.com>.
*factorize-movielens-1M.sh:*
RMSE is:

0.8519064098265133


Sample recommendations:

2229
 [2197:4.4961276,527:4.4309845,2972:4.4180074,318:4.379484,572:4.312696,3092:4.249903]
5848
 [1900:4.6775646,3787:4.6623707,632:4.641377,2609:4.608225,3808:4.6058936,2998:4.6057487]
3728
 [572:4.951382,47:4.748921,874:4.6945343,1563:4.679901,3314:4.6621537,50:4.655838]
1252
 [53:5.0,3816:4.9664702,3077:4.9494777,213:4.94007,3808:4.9060082,978:4.8568053]
634
[572:5.0,3092:4.779557,1872:4.72024,2687:4.629712,2125:4.615142,3853:4.5261393]
5516    [572:5.0,2197:5.0,3092:5.0,318:4.908213,356:4.885,3844:4.8237453]
2276    [1204:5.0,572:5.0,912:5.0,1250:5.0,1272:4.999891,1262:4.989652]
4219
 [53:4.8112006,598:4.775032,858:4.761604,572:4.7579737,1219:4.680987,1221:4.6604886]
91      [1198:5.0,2762:5.0,1207:5.0,1234:5.0,318:5.0,260:5.0]
502
[953:5.0,260:4.9800477,1234:4.869403,1198:4.8527064,1207:4.8497486,3469:4.847286]

factorize-netflix.sh:
References a no-longer-available data set that Netflix took down after the
competition; should at least mention that the data set is no longer
"online" at least.


On Tue, Jan 21, 2014 at 8:05 PM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:

> *clustering-syntheticcontrol.sh*
>
> *Canopy:*
> [snip]
>         1.0 : [distance-squared=1740.681000315628]: [35.486, 25.600,
> 29.914, 30.200, 27.654, 28.647, 26.582, 32.941, 34.728, 29.047, 34.706,
> 34.764, 28.816, 30.271, 25.784, 25.035, 35.436, 29.075, 24.267, 24.625,
> 28.162, 26.218, 28.773, 28.966, 28.802, 34.292, 26.211, 33.363, 32.920,
> 31.891, 34.504, 32.686, 24.327, 35.981, 31.390, 10.832, 20.238, 10.051,
> 14.877, 10.570, 19.603, 14.544, 10.667, 16.470, 19.007, 10.352, 13.473,
> 12.196, 10.684, 16.620, 20.434, 17.069, 18.744, 9.599, 11.195, 12.002,
> 10.017, 17.149, 14.850, 10.890]
>         1.0 : [distance-squared=1455.363773097357]: [31.022, 28.140,
> 26.730, 26.570, 29.561, 26.966, 28.049, 25.673, 33.721, 26.275, 30.410,
> 31.101, 24.019, 35.659, 25.253, 25.932, 28.618, 32.423, 33.666, 33.745,
> 35.118, 29.164, 25.477, 31.947, 35.491, 30.730, 25.820, 24.651, 25.528,
> 31.343, 29.005, 31.825, 26.891, 28.194, 31.429, 16.935, 8.070, 16.604,
> 14.743, 10.342, 8.155, 10.395, 17.689, 16.791, 14.138, 15.761, 6.787,
> 13.062, 16.660, 15.021, 9.891, 9.216, 11.550, 8.877, 18.220, 9.477, 10.342,
> 16.430, 11.898, 15.366]
>         1.0 : [distance-squared=1679.9304895378882]: [29.625, 25.503,
> 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> 15.285, 22.528, 20.657, 24.129]
>         1.0 : [distance-squared=2044.2887801683828]: [27.414, 25.397,
> 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> 20.229, 11.131, 9.980, 10.720]
>         1.0 : [distance-squared=1385.3154063160764]: [35.899, 26.672,
> 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> 16.546, 15.927, 18.084, 17.475]
>         1.0 : [distance-squared=1920.6376615603585]: [24.538, 24.280,
> 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> 19.310, 12.999, 17.460]
>         1.0 : [distance-squared=2192.939571172661]: [34.335, 30.938,
> 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> 11.743, 11.699, 10.152]
> Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Wrote 6 clusters
> Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 5827 ms (Minutes: 0.09711666666666667)
>
> *K-means:*
> [snip]
>         1.0 : [distance-squared=2873.881301031739]: [26.369, 37.791,
> 41.839, 39.694, 36.728, 35.079, 30.668, 24.755, 20.610, 18.885, 15.459,
> 14.074, 19.117, 34.230, 32.202, 40.715, 39.543, 37.193, 39.448, 30.829,
> 25.273, 22.324, 19.801, 13.505, 17.462, 24.287, 27.398, 38.577, 42.108,
> 41.947, 43.987, 41.331, 28.958, 23.664, 20.308, 22.244, 12.149, 15.768,
> 18.964, 32.579, 33.202, 36.205, 42.364, 40.601, 35.415, 39.576, 33.145,
> 19.936, 23.062, 19.053, 24.383, 19.611, 25.218, 38.860, 36.570, 38.964,
> 36.946, 36.900, 32.593, 31.563]
>         1.0 : [distance-squared=2525.8924241648783]: [35.389, 31.178,
> 40.041, 43.034, 49.524, 40.942, 42.369, 30.153, 26.253, 23.178, 19.885,
> 19.693, 21.837, 26.858, 23.533, 29.798, 43.401, 43.349, 48.238, 43.868,
> 44.017, 28.056, 25.284, 22.737, 13.703, 14.023, 22.115, 23.720, 25.478,
> 37.870, 37.868, 46.954, 43.135, 41.286, 37.269, 29.347, 24.312, 21.743,
> 19.152, 21.668, 10.882, 16.147, 30.020, 28.472, 38.716, 44.620, 47.847,
> 48.846, 39.361, 38.449, 33.105, 19.935, 14.961, 12.149, 12.630, 13.459,
> 18.967, 35.473, 30.146, 45.527]
>         1.0 : [distance-squared=2392.7171990886272]: [27.662, 37.199,
> 39.158, 44.264, 46.473, 40.178, 38.728, 24.412, 25.155, 15.938, 13.125,
> 16.867, 13.875, 29.969, 34.322, 40.870, 44.225, 47.594, 47.607, 44.751,
> 39.813, 32.461, 16.646, 16.305, 21.256, 20.627, 19.267, 31.901, 34.995,
> 43.643, 43.152, 47.125, 48.524, 33.131, 32.697, 20.300, 14.350, 18.330,
> 12.155, 11.261, 19.701, 29.313, 30.457, 39.914, 49.543, 49.851, 46.445,
> 39.561, 28.860, 22.486, 13.967, 10.006, 11.715, 15.945, 21.348, 25.439,
> 41.024, 37.105, 45.623, 43.589]
>         1.0 : [distance-squared=1419.8378244373016]: [25.784, 34.129,
> 42.659, 37.176, 35.961, 34.307, 32.108, 29.749, 25.047, 17.455, 24.652,
> 25.311, 22.995, 30.256, 25.955, 28.426, 34.556, 40.386, 39.642, 40.566,
> 32.612, 34.091, 26.033, 25.668, 26.545, 17.338, 24.980, 20.134, 27.542,
> 36.612, 31.855, 37.947, 39.736, 33.535, 36.607, 37.479, 32.612, 22.262,
> 20.662, 16.124, 24.547, 27.686, 21.747, 27.198, 31.259, 40.569, 37.067,
> 34.465, 34.730, 33.371, 23.060, 30.162, 22.022, 22.216, 14.812, 19.357,
> 24.508, 34.432, 32.155, 34.839]
>         1.0 : [distance-squared=4186.814512311335]: [25.870, 39.195,
> 36.908, 47.052, 47.384, 40.741, 42.494, 30.282, 25.834, 17.650, 16.004,
> 17.895, 13.321, 19.045, 27.440, 31.911, 39.208, 43.622, 41.567, 44.815,
> 44.921, 35.422, 35.477, 23.190, 17.859, 14.684, 23.504, 23.141, 21.746,
> 30.816, 31.361, 37.015, 38.094, 46.688, 47.681, 43.777, 39.652, 31.701,
> 23.767, 22.265, 22.654, 22.327, 19.195, 21.163, 29.602, 27.563, 36.244,
> 38.859, 44.234, 42.352, 42.160, 40.172, 30.094, 21.092, 25.193, 13.096,
> 18.111, 14.754, 27.386, 27.026]
>         1.0 : [distance-squared=1544.4011543572997]: [28.075, 41.784,
> 42.120, 38.735, 44.320, 34.316, 32.212, 31.868, 24.301, 14.547, 17.178,
> 22.279, 24.357, 31.011, 31.444, 34.837, 46.550, 48.301, 38.859, 42.363,
> 35.657, 31.499, 19.794, 12.124, 15.371, 12.436, 15.763, 24.679, 32.597,
> 43.004, 36.616, 38.935, 42.954, 34.957, 36.183, 28.177, 16.326, 14.988,
> 10.680, 22.728, 24.075, 24.058, 36.616, 43.982, 39.198, 40.118, 40.078,
> 34.752, 34.018, 23.750, 18.374, 12.251, 15.539, 18.699, 28.973, 24.044,
> 39.404, 38.034, 46.458, 44.432]
>         1.0 : [distance-squared=825.9338725427806]: [33.670, 38.675,
> 39.742, 41.989, 37.291, 43.975, 31.909, 25.878, 31.080, 15.858, 13.950,
> 23.097, 19.983, 21.692, 31.579, 38.570, 33.376, 38.843, 41.936, 33.534,
> 39.195, 32.897, 25.343, 18.523, 15.089, 17.771, 22.614, 25.313, 23.687,
> 29.010, 41.995, 35.712, 40.872, 41.669, 32.156, 25.162, 24.980, 23.705,
> 18.413, 20.975, 14.906, 26.171, 30.165, 27.818, 35.083, 39.514, 37.851,
> 33.967, 32.338, 34.977, 26.589, 28.079, 19.597, 24.669, 23.098, 25.685,
> 28.215, 34.940, 36.910, 39.749]
> Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Wrote 6 clusters
> Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 16902 ms (Minutes: 0.2817)
>
> *Fuzzy k-means:*
> [snip]
>         1.0 : [distance-squared=971.7369782121968]: [29.625, 25.503,
> 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> 15.285, 22.528, 20.657, 24.129]
>         1.0 : [distance-squared=2054.618163154475]: [27.414, 25.397,
> 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> 20.229, 11.131, 9.980, 10.720]
>         1.0 : [distance-squared=954.6503560728597]: [35.899, 26.672,
> 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> 16.546, 15.927, 18.084, 17.475]
>         1.0 : [distance-squared=2817.9170498632957]: [24.538, 24.280,
> 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> 19.310, 12.999, 17.460]
>         1.0 : [distance-squared=3472.3684696871424]: [34.335, 30.938,
> 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> 11.743, 11.699, 10.152]
> Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Wrote 6 clusters
> Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 106615 ms (Minutes: 1.7769166666666667)
>
> *Dirichlet and Meanshift:*
> Already detailed in M-1400, deprecated jobs still referenced.
>
>
>
> On Tue, Jan 21, 2014 at 6:20 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
>> *cluster-reuters.sh*
>> *k-means:*
>>
>> [snip]
>> :VL-19482{n=913 c=[0.06:0.011, 0.1:0.007, 0.13:0.010, 0.25:0.016,
>> 0.38:0.020, 0.4:0.007, 0.5:0.032, 0
>>         Top Terms:
>>                 banks                                   =>
>> 3.841823268955143
>>                 bank                                    =>
>>  3.80633066361209
>>                 debt                                    =>
>>  3.28065219870794
>>                 said                                    =>
>>  2.5965700942088583
>>                 he                                      =>
>> 2.335682813857497
>>                 foreign                                 =>
>>  2.2217853688201403
>>                 billion                                 =>
>>  2.1970193848291335
>>                 would                                   =>
>>  1.9932392063955617
>>                 loans                                   =>
>>  1.9309276792854233
>>                 interest                                =>
>>  1.787324501938
>>                 have                                    =>
>> 1.762981951432578
>>                 its                                     =>
>>  1.7615109954971866
>>                 which                                   =>
>>  1.5822081148036862
>>                 has                                     =>
>>  1.5600708189041956
>>                 dlrs                                    =>
>>  1.5571038313005996
>>                 finance                                 =>
>>  1.5539758811252924
>>                 new                                     =>
>>  1.5176015811577555
>>                 had                                     =>
>>  1.5138723701401844
>>                 brazil                                  =>
>>  1.5083369853593172
>>                 payments                                =>
>>  1.4539044255886517
>>         Weight : [props - optional]:  Point:
>>
>> :VL-7320{n=2726 c=[0:0.003, 0.1:0.010, 0.2:0.007, 0.3:0.009, 0.4:0.007,
>> 0.40:0.003, 0.5:0.009, 0.57:0
>>         Top Terms:
>>                 vs                                      =>
>> 6.126130791333171
>>                 net                                     =>
>> 4.012191567277523
>>                 cts                                     =>
>> 3.822006848832744
>>                 shr                                     =>
>>  3.6786004856764527
>>                 mln                                     =>
>>  2.9011643584038698
>>                 loss                                    =>
>> 2.788368861463607
>>                 qtr                                     =>
>> 2.714140225051522
>>                 revs                                    =>
>>  2.4739861236454717
>>                 profit                                  =>
>>  1.8146888090247015
>>                 note                                    =>
>>  1.7977163272138388
>>                 dlrs                                    =>
>>  1.6164390808155846
>>                 avg                                     =>
>>  1.3901765773336587
>>                 shrs                                    =>
>>  1.3856326531419314
>>                 mths                                    =>
>>  1.3168717272038506
>>                 4th                                     =>
>>  1.2161158425617289
>>                 oper                                    =>
>> 1.182419473776814
>>                 year                                    =>
>> 1.178086061733047
>>                 nine                                    =>
>>  1.0670554836445316
>>                 3rd                                     =>
>> 1.041334410056592
>>                 inc                                     =>
>>  1.0019361981554935
>>         Weight : [props - optional]:  Point:
>>
>>
>> Inter-Cluster Density: 0.45562152681859414
>> Intra-Cluster Density: 0.6952712632167628
>> CDbw Inter-Cluster Density: 0.0
>> CDbw Intra-Cluster Density: 16.486930227598684
>> CDbw Separation: 194.49005884464628
>>
>> *fuzzy k-means:*
>> :SV-18539{n=1039 c=[0:0.026, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
>> 0.01:0.005, 0.02:0.002, 0.0
>>         Top Terms:
>>                 said                                    =>
>>  1.8665592354713065
>>                 its                                     =>
>>  1.1335212213411592
>>                 pct                                     =>
>>  1.0862816801353348
>>                 dlrs                                    =>
>>  1.0854998884993752
>>                 mln                                     =>
>> 1.043163996400643
>>                 from                                    =>
>>  0.9684961110525736
>>                 has                                     =>
>> 0.912161511978058
>>                 company                                 =>
>>  0.8754186972808333
>>                 mar                                     =>
>>  0.8675333452422878
>>                 inc                                     =>
>>  0.7678617590362815
>>                 would                                   =>
>>  0.7610968883652675
>>                 he                                      =>
>>  0.7459988770503974
>>                 which                                   =>
>>  0.7435613119406804
>>                 year                                    =>
>>  0.7302840632748394
>>                 u.s                                     =>
>>  0.7281061062439116
>>                 shares                                  =>
>>  0.7260764102983083
>>                 corp                                    =>
>>  0.7179807367808658
>>                 new                                     =>
>>  0.7044203783157115
>>                 stock                                   =>
>>  0.6962010978721442
>>                 have                                    =>
>>  0.6464265467298506
>> :SV-9431{n=1034 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
>> 0.01:0.004, 0.02:0.002, 0.02
>>         Top Terms:
>>                 said                                    =>
>> 1.864911184196927
>>                 dlrs                                    =>
>> 1.199286689822081
>>                 mln                                     =>
>>  1.1802134783562215
>>                 pct                                     =>
>>  1.1529704214798124
>>                 its                                     =>
>>  1.1184398851519701
>>                 from                                    =>
>> 1.016647848050332
>>                 company                                 =>
>> 0.894703604722841
>>                 mar                                     =>
>> 0.879986159541356
>>                 has                                     =>
>>  0.8642799128491316
>>                 year                                    =>
>>  0.8271823503717782
>>                 inc                                     =>
>>  0.7871293745341424
>>                 corp                                    =>
>> 0.737705498468879
>>                 which                                   =>
>> 0.722975201852743
>>                 would                                   =>
>> 0.708000816484415
>>                 u.s                                     =>
>>  0.7073294276173905
>>                 billion                                 =>
>>  0.7055723996916351
>>                 he                                      =>
>>  0.7042684217823294
>>                 new                                     =>
>>  0.6834737905434939
>>                 shares                                  =>
>>  0.6753327384172428
>>                 stock                                   =>
>>  0.6576225144041699
>> :SV-4785{n=1044 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
>> 0.01:0.006, 0.02:0.002, 0.02
>>         Top Terms:
>>                 said                                    =>
>>  1.8796076179735086
>>                 its                                     =>
>> 1.172025965452378
>>                 dlrs                                    =>
>> 1.130422792460914
>>                 pct                                     =>
>> 1.082038255241358
>>                 mln                                     =>
>>  1.0772146872767114
>>                 company                                 =>
>>  0.9662235879639138
>>                 from                                    =>
>>  0.9473172871605616
>>                 has                                     =>
>>  0.9224712965830099
>>                 mar                                     =>
>>  0.8769325856924421
>>                 inc                                     =>
>>  0.8360245257169788
>>                 shares                                  =>
>>  0.8334595641384324
>>                 stock                                   =>
>>  0.7704621839612175
>>                 corp                                    =>
>>  0.7682400250301806
>>                 which                                   =>
>>  0.7389988207856137
>>                 would                                   =>
>>  0.7339708917389389
>>                 year                                    =>
>>  0.7088414843731325
>>                 new                                     =>
>>  0.7038109468655172
>>                 he                                      =>
>>  0.6993994455501005
>>                 u.s                                     =>
>>  0.6772649147622415
>>                 share                                   =>
>>  0.6241804830055171
>>
>> *lda:*
>>
>> [snip]
>> 21539
>> {0.02:0.7071698119320297,0.01:0.09185661419250732,0.055:0.05559716236030052,0:0.04416624482186779,0.046:0.04110903741109505,0.10:0.036664417789754995,0.073:0.009543544335363714,0.006913:0.0050293341354450535,0.007050:0.004031353935388081,0.003:0.0019163134919350053}
>> 21540
>> {0.04:0.4289840457590362,0.006913:0.3764552842292379,0.03:0.14201700033942147,0.025:0.031492533562460345,0.01:0.0057428959027208,0.0625:0.005563615996288134,0.007050:0.004410121345864598,0.02:0.003315679992182833,0.06:0.0010680133665222197,0.057:2.6908116668663575E-4}
>> 21541
>> {0:0.6323790020346274,0.02:0.282038438551802,0.06:0.05929917592183956,0.046:0.021922159161082488,0.025:0.003123559949176154,0.1:6.163861804777446E-4,0.057:1.1513607281652563E-4,0.077:7.546641269535658E-5,0.05:6.020069105221075E-5,0.04:5.727546417602751E-5}
>> 21542
>> {0.06:0.7916202902993545,0.003:0.12474538653424426,0.03:0.0516187501990535,0.073:0.013078206873952192,0.077:0.006874558859394474,0.1:0.004726257215175674,0.07:0.0029954110394770084,0.02:0.0015535221634909381,0.0625:8.057802584933225E-4,0.006913:5.349608183182145E-4}
>> 21543
>> {0.03:0.8557703918728101,0.007050:0.0884696514495358,0.01:0.02386217742025169,0.055:0.01794750983209314,0.046:0.007244240662980594,0.02:0.0025021610305199344,0.1:0.0011795344808501283,0.0625:9.555718731181631E-4,0.077:8.384080940465597E-4,0.003:2.4480111447900804E-4}
>> 21544
>> {0.006913:0.6497351267772595,0.03:0.2586364130498913,0.003:0.0845308091084703,0.05:0.003488921831506818,0.0625:0.0011991059579690732,0.06:5.301659535652308E-4,0.055:3.152013670552699E-4,0.025:3.11949418681945E-4,0.02:2.2346509541410544E-4,0.057:1.727741085612565E-4}
>> 21545
>> {0.07:0.24937001680778675,0.03:0.22854680579732564,0.1:0.22068070811382484,0.05:0.21891358916881656,0.06:0.04971205462818302,0.046:0.029384231252419384,0.025:0.0014118858333346275,0.0625:0.0012321257282372393,0.073:3.1606402052550454E-4,0.04:1.1662985389192366E-4}
>> 21546
>> {0.006913:0.652435612545713,0.073:0.24282539253890825,0.003:0.055020663101050894,0.046:0.01978650831133704,0.04:0.011699600233998459,0.02:0.006822518565048025,0.05:0.0042733514779345234,0.025:0.0020768247329981497,0.03:0.0015466188524926666,0:0.0010433788491090747}
>> 21547
>> {0.1:0.935072008706917,0.077:0.040616382672055494,0.046:0.023138054104883225,0.06:3.42252302356326E-4,0.03:2.672247473681925E-4,0.007050:9.004419280457053E-5,0.006913:7.844271097106069E-5,0.04:7.198835322717653E-5,0.073:5.8017696474244885E-5,0.02:4.474316852571109E-5}
>> 21548
>> {0.006913:0.4510141547325999,0.03:0.21472666934984877,0.07:0.15060570743672352,0.046:0.08216181853028293,0.05:0.07498297963542139,0.077:0.01401215532342401,0.04:0.006135722806477439,0.073:0.0031124469556872442,0.02:0.001327252389650958,0.025:4.5167404646311124E-4}
>> 21549
>> {0.077:0.5249260290096315,0:0.27520186965742544,0.073:0.08959570186504386,0.057:0.05063809804337512,0.02:0.045313417578343,0.03:0.00856024297303885,0.06:0.0034039136814002697,0.07:8.611323331226122E-4,0.05:3.9387255234958607E-4,0.04:3.454752498776842E-4}
>> 21550
>> {0.077:0.8319708524327014,0.007050:0.16001430652201168,0.0625:0.0024884528530674226,0.03:0.0024228964094551416,0.006913:0.0010048446080994332,0.025:5.974607168723063E-4,0.06:2.7741593377792194E-4,0.057:2.2975316663437597E-4,0.073:2.1769059479546598E-4,0.01:1.4945234676270913E-4}
>> 21551
>> {0:0.43298549949426596,0.07:0.30407459462158964,0.0625:0.15744077685124136,0.01:0.06385977925647884,0.025:0.020205557109867888,0.04:0.013226123091680062,0.055:0.006501562964287696,0.073:5.871897881404074E-4,0.003:4.554836644848264E-4,0.05:2.0489216962221778E-4}
>> 21552
>> {0.06:0.7383744333942458,0.02:0.07321126682102753,0.07:0.05910378841288311,0.10:0.056907223730939045,0:0.02739559786902668,0.055:0.02228913751272657,0.1:0.00943274247398869,0.073:0.007301445750018608,0.03:0.0027711985062277246,0.046:0.0022569760697531112}
>> 21553
>> {0.006913:0.9383779606743132,0.07:0.018356074893823605,0.1:0.017572502072403694,0.025:0.011031848455352145,0.02:0.007110054905474798,0.10:0.0032390933761145377,0.01:0.0022676909091530165,0.06:6.35712654913703E-4,0.003:3.037608224814005E-4,0.0625:2.569989005512836E-4}
>> 21554
>> {0.02:0.4740260172915081,0.03:0.3264871353578654,0.003:0.12383071192450323,0:0.0387913487693229,0.046:0.013922363892934853,0.01:0.01115832101415319,0.055:0.004438529958216764,0.006913:0.0031112721201723413,0.025:0.002399946628810436,0.073:6.131643125523258E-4}
>> 21555
>> {0.01:0.42396888624053325,0.057:0.2876207419392007,0.055:0.13436747721404915,0.10:0.0923627978966975,0.05:0.030530372962547347,0.003:0.008404234144369286,0.006913:0.007605013189108045,0.007050:0.005108825532219537,0.0625:0.004651573366090424,0.025:0.002180181204608079}
>> 21556
>> {0.02:0.9634685201058687,0.077:0.02816791882994464,0.003:0.007368001305747207,0.01:4.6299664334060455E-4,0.055:6.358811991095171E-5,0.06:5.5240076581247115E-5,0.03:5.231350505975146E-5,0.073:4.4773553407989E-5,0:3.6810457882911134E-5,0.006913:3.469265861451538E-5}
>> 21557
>> {0.06:0.9993947051352264,0.05:2.53296407549323E-4,0.03:9.647254237540585E-5,0.04:4.3439270433017595E-5,0.006913:3.354681491698619E-5,0.046:2.8586727566416525E-5,0.007050:2.5584606074089293E-5,0.02:2.2477243875316502E-5,0:1.9194783598477564E-5,0.073:1.6900267481892075E-5}
>> 21558
>> {0.06:0.9985073788555696,0.03:8.033091100049725E-4,0.02:1.1953766739610202E-4,0.1:9.257610652171745E-5,0.006913:6.903783269939522E-5,0.04:5.7960967280609926E-5,0.07:5.5767050956214925E-5,0.046:4.857027337508731E-5,0.007050:3.5964741541364354E-5,0.057:3.538185773175377E-5}
>> 21559
>> {0.006913:0.5411158302162348,0.073:0.10595054605908563,0.04:0.10013413310674449,0.007050:0.08891690362990352,0.003:0.08028744789933502,0.03:0.07779866329563544,0.025:0.0024314950846438975,0.0625:0.0018111845757907532,0.077:5.019763638644379E-4,0.055:2.6705221435486376E-4}
>> 21560
>> {0.06:0.9978884964462115,0.03:9.139179655096763E-4,0.02:6.066525825847198E-4,0.04:3.2209695809936266E-4,0.006913:7.970816960964983E-5,0.007050:5.58984012266038E-5,0.046:1.545206468939752E-5,0.077:1.5319891034946036E-5,0:1.5223193391279898E-5,0.073:1.1687759018690983E-5}
>> 21561
>> {0.06:0.8704787441462888,0.007050:0.09131090501970876,0.0625:0.03778566273654969,0.046:8.111987924033587E-5,0.01:6.397000919080148E-5,0.1:4.979138875277178E-5,0.073:2.7942853854174412E-5,0.03:2.762357950066146E-5,0.077:2.6820742114533983E-5,0.025:2.6208487691114472E-5}
>> 21562
>> {0.06:0.9625603310705717,0.02:0.01708933716171641,0.007050:0.011374975845817934,0.07:0.00482114999912725,0.057:0.003310531318631415,0.077:2.568457462720504E-4,0.025:1.3439876933758153E-4,0.1:1.0512433283405881E-4,0:8.737443941062146E-5,0.046:5.9284232059258864E-5}
>> 21563
>> {0.06:0.9996809273982157,0.046:1.6325482120709976E-4,0.02:2.1084598024339765E-5,0.006913:1.7901242019979392E-5,0.04:1.3991162886383784E-5,0.03:1.3682157013015017E-5,0.077:1.3602492456590212E-5,0.007050:1.206712606141859E-5,0.1:1.0231842284246997E-5,0.057:7.704725064510759E-6}
>> 21564
>> {0.06:0.9997861963976675,0.03:2.7140899197995276E-5,0.077:2.4456158311423386E-5,0.04:1.7952818233846462E-5,0.02:1.6334455815684533E-5,0.006913:1.4910843270220926E-5,0.073:1.4893553937733922E-5,0.1:1.2545291899719683E-5,0.007050:1.2337617858874285E-5,0.07:1.121302251254011E-5}
>> 21565
>> {0.006913:0.5047832315246878,0.007050:0.2502505818382197,0.04:0.09937533960784072,0.03:0.05332716291468396,0.0625:0.035738976624857435,0.05:0.023139962103851885,0.1:0.01510786357969295,0.025:0.01294816540331917,0.06:0.003589347327961106,0.073:4.8607269023994543E-4}
>> 21566
>> {0.03:0.865579490292393,0.073:0.065013560785593,0.077:0.056622094108767465,0.046:0.006811842330071251,0.057:0.0021561477140846267,0.01:0.0013149375957061502,0.04:7.456782721333958E-4,0.05:6.412995789267404E-4,0:2.107021879325011E-4,0.007050:1.8121393989190674E-4}
>> 21567
>> {0.077:0.6528663315309344,0.03:0.2794089480653573,0.025:0.060214953606503134,0.003:0.0029851203917978303,0.01:0.002840784719750811,0.007050:5.508511345707982E-4,0.02:3.3395642786457786E-4,0.07:1.7452781529689483E-4,0.055:1.541344869853217E-4,0.046:1.0154945247629696E-4}
>> 21568
>> {0.057:0.7925855379581803,0.03:0.2036907633660934,0.06:0.002283924010657722,0.046:0.0011125177332923534,0.10:6.706442563331911E-5,0.02:3.9070227131596934E-5,0.07:3.266094677087569E-5,0.1:2.3436639268605713E-5,0.077:2.318067689954084E-5,0.006913:2.273173553155518E-5}
>> 21569
>> {0.06:0.7223525910216753,0.1:0.22280373045161775,0.04:0.04239924319412595,0.02:0.006529556700876843,0.007050:0.004661124794787862,0.10:2.017974034648702E-4,0.05:2.017516118028694E-4,0.025:1.583677755896652E-4,0.006913:1.1939663934259253E-4,0:8.736457986006156E-5}
>> 21570
>> {0.073:0.5321098304788365,0.006913:0.3897385574581158,0.02:0.051125703897749404,0.077:0.012417023254098358,0.01:0.006491518762040415,0.03:0.005434610908750246,0.055:9.352347087701305E-4,0.06:5.239363525283659E-4,0.10:4.4114444135088393E-4,0.003:2.2360302221231853E-4}
>> 21571
>> {0.06:0.9074255414695478,0.05:0.08971808812931319,0.02:0.0019124104766371694,0.1:2.581234320438502E-4,0.073:1.9041498001195312E-4,0.046:8.348355382104383E-5,0.006913:5.748062878632201E-5,0.057:4.8979196235823963E-5,0.04:4.4005272512340306E-5,0.03:4.1089625609562384E-5}
>> 21572
>> {0.06:0.7216888928389846,0.04:0.1955684645266304,0.006913:0.08235454447065854,0.1:9.82899034505965E-5,0.046:4.9184577303445956E-5,0.05:4.021965070167039E-5,0.007050:3.081280652634891E-5,0.073:2.791145099471127E-5,0.02:1.95681856192452E-5,0:1.9037190007395713E-5}
>> 21573
>> {0.05:0.8568971411565196,0.046:0.12909436895238377,0.06:0.012704234652048044,0.04:2.472784666357729E-4,0.055:2.1886698996361582E-4,0.1:1.1974451444757112E-4,0.0625:9.082357988309755E-5,0.07:9.030017229129562E-5,0.03:7.268045692763623E-5,0.073:6.607926928741721E-5}
>> 21574
>> {0.046:0.5619466458628039,0.006913:0.184782367089353,0.0625:0.09726566772972363,0.003:0.09534816862353344,0.02:0.015692927163565275,0.073:0.015492017672231727,0.01:0.01411218625979968,0.007050:0.01037341031640615,0.055:0.0020124341216292752,0:0.0013828685922332715}
>> 21575
>> {0.05:0.3167681189235041,0.06:0.2879333280436204,0.046:0.22584628506521745,0.003:0.15994241628395953,0.0625:0.006482994028630967,0.1:0.0016773104050919493,0.055:4.8725015996676173E-4,0.03:2.432893903382962E-4,0.025:1.6015914759364425E-4,0.006913:1.0021292427951807E-4}
>> 21576
>> {0.077:0.2407816981967022,0.003:0.18594668103110193,0.10:0.14951775492012523,0.0625:0.09241152906714677,0.007050:0.08997645163280943,0.057:0.07102865286733068,0.055:0.048046579920457584,0.05:0.03776387140040494,0.073:0.033355081179026046,0.006913:0.015980584385115525}
>> 21577
>> {0.06:0.9599074612361259,0.1:0.03694135499501186,0.04:0.0021941315048273186,0.057:4.7070027358666304E-4,0.02:1.8231606308803002E-4,0.0625:6.234942557920162E-5,0.006913:5.159011604129561E-5,0.046:3.9832812943124216E-5,0.007050:3.0380683715134534E-5,0.05:2.8920851352755496E-5}
>>
>>
>> *Streaming k-means:*
>>
>> [snip]
>> INFO: Number of Centroids: 0
>> Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
>> WARNING: job_local23982482_0001
>> java.lang.IllegalArgumentException: Must have nonzero number of training
>> and test vectors. Asked for %.1f %% of %d vectors for test
>> [10.000000149011612, 0]
>>         at
>> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
>>         at
>> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
>>         at
>> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
>>         at
>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
>>         at
>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
>>         at
>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
>>         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
>>         at
>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>>         at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
>>
>> [snip]
>>
>> WARNING: No qualcluster.props found on classpath, will use command-line
>> arguments only
>> Num clusters: 0; maxDistance: 0.000000
>> [Dunn Index] First: Infinity
>> [Davies-Bouldin Index] First: NaN
>> Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Program took 535 ms (Minutes: 0.008916666666666666)
>> cluster,distance.mean,distance.sd
>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>>
>>
>> On Tue, Jan 21, 2014 at 1:47 PM, Andrew Musselman <
>> andrew.musselman@gmail.com> wrote:
>>
>>> *classify-20newsgroups.sh*
>>>
>>> *Complementary naive bayes:*
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances          :      11207       98.9406%
>>> Incorrectly Classified Instances        :        120        1.0594%
>>> Total Classified Instances              :      11327
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a       b       c       d       e       f       g       h       i
>>> j       k       l       m       n       o       p       q       r       s
>>>    t        <--Classified as
>>> 475     0       0       1       0       0       0       0       0
>>> 0       0       0       0       0       1       0       1       0       0
>>>    0         |  478         a     = alt.atheism
>>> 0       597     1       1       0       1       1       0       0
>>> 0       0       1       0       2       1       0       0       0       0
>>>    0         |  605         b     = comp.graphics
>>> 0       1       620     3       0       1       0       0       0
>>> 0       0       1       0       0       1       0       0       0       0
>>>    0         |  627         c     = comp.os.ms-windows.misc
>>> 1       1       1       593     2       0       0       0       0
>>> 0       0       0       0       0       0       1       0       0       0
>>>    0         |  599         d     = comp.sys.ibm.pc.hardware
>>> 0       1       1       0       568     0       1       0       0
>>> 0       1       1       2       0       0       0       0       1       0
>>>    0         |  576         e     = comp.sys.mac.hardware
>>> 0       4       2       0       0       581     0       0       0
>>> 0       0       0       0       0       0       0       0       0       0
>>>    0         |  587         f     = comp.windows.x
>>> 0       0       0       1       2       0       571     3       0
>>> 0       1       1       4       1       0       0       0       0       0
>>>    0         |  584         g     = misc.forsale
>>> 0       0       0       1       0       0       0       589     1
>>> 0       0       1       1       0       0       0       0       0       0
>>>    0         |  593         h     = rec.autos
>>> 0       0       0       0       0       0       0       1       565
>>> 0       0       0       0       0       1       0       0       0       0
>>>    0         |  567         i     = rec.motorcycles
>>> 0       0       0       0       0       0       0       0       0
>>> 600     2       0       0       0       1       0       0       0       0
>>>    0         |  603         j     = rec.sport.baseball
>>> 0       0       0       0       0       0       0       0       0
>>> 1       584     0       0       0       0       0       0       0       0
>>>    0         |  585         k     = rec.sport.hockey
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       579     0       0       0       0       0       1       0
>>>    0         |  580         l     = sci.crypt
>>> 0       0       0       1       3       0       2       0       0
>>> 2       0       0       567     1       2       1       0       0       0
>>>    0         |  579         m     = sci.electronics
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       0       1       605     0       0       0       0       0
>>>    0         |  606         n     = sci.med
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       0       0       0       602     0       0       0       0
>>>    0         |  602         o     = sci.space
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       0       0       1       0       602     0       0       1
>>>    0         |  604         p     = soc.religion.christian
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       0       0       0       0       0       556     0       0
>>>    0         |  556         q     = talk.politics.mideast
>>> 0       0       1       0       0       0       0       0       0
>>> 0       0       1       0       0       1       0       0       568     0
>>>    0         |  571         r     = talk.politics.guns
>>> 11      0       0       0       0       0       0       0       0
>>> 1       0       0       0       1       3       8       1       4       338
>>>    2         |  369         s     = talk.religion.misc
>>> 0       0       0       0       0       0       0       0       0
>>> 0       1       0       0       0       1       0       3       4       0
>>>    447       |  456         t     = talk.politics.misc
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa                                       0.9806
>>> Accuracy                                   98.9406%
>>> Reliability                                94.0932%
>>> Reliability (standard deviation)            0.2163
>>>
>>> Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 15870 ms (Minutes: 0.2645)
>>> + echo 'Testing on holdout set'
>>> Testing on holdout set
>>> + ./bin/mahout testnb -i /tmp/mahout-work-ec2-user/20news-test-vectors
>>> -m /tmp/mahout-work-ec2-user/model -l /tmp/mahout-work-ec2-user/labelindex
>>> -ow -o /tmp/mahout-work-ec2-user/20news-testing -c
>>>
>>> [snip]
>>>
>>> INFO: Complementary Results:
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances          :       6715       89.3071%
>>> Incorrectly Classified Instances        :        804       10.6929%
>>> Total Classified Instances              :       7519
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a       b       c       d       e       f       g       h       i
>>> j       k       l       m       n       o       p       q       r       s
>>>    t        <--Classified as
>>> 298     0       0       0       0       0       0       0       0
>>> 1       0       0       0       1       2       5       1       0       13
>>>     0         |  321         a     = alt.atheism
>>> 0       298     11      6       1       12      2       2       1
>>> 1       3       8       3       4       2       4       1       4       4
>>>    1         |  368         b     = comp.graphics
>>> 1       17      286     16      4       9       6       3       2
>>> 0       1       0       1       7       1       0       2       1       0
>>>    1         |  358         c     = comp.os.ms-windows.misc
>>> 2       6       11      309     9       5       14      8       1
>>> 0       2       0       6       4       2       0       1       2       1
>>>    0         |  383         d     = comp.sys.ibm.pc.hardware
>>> 0       10      8       7       334     7       5       5       2
>>> 0       3       0       2       1       1       0       1       1       0
>>>    0         |  387         e     = comp.sys.mac.hardware
>>> 1       13      7       8       2       355     2       0       2
>>> 0       0       5       1       1       3       0       0       1       0
>>>    0         |  401         f     = comp.windows.x
>>> 0       7       11      29      12      9       268     16      8
>>> 4       3       2       6       4       2       1       3       1       2
>>>    3         |  391         g     = misc.forsale
>>> 0       1       0       0       3       0       7       362     8
>>> 2       2       1       2       0       2       0       1       2       0
>>>    4         |  397         h     = rec.autos
>>> 0       0       0       1       0       0       1       0       423
>>> 0       0       0       2       1       0       1       0       0       0
>>>    0         |  429         i     = rec.motorcycles
>>> 0       0       1       0       0       0       0       2       2
>>> 371     8       0       2       3       0       2       0       0       0
>>>    0         |  391         j     = rec.sport.baseball
>>> 0       0       1       0       0       0       1       0       0
>>> 2       409     0       0       0       0       0       0       0       0
>>>    1         |  414         k     = rec.sport.hockey
>>> 0       0       1       2       1       0       1       0       0
>>> 0       0       404     0       0       0       0       0       1       0
>>>    1         |  411         l     = sci.crypt
>>> 0       5       4       11      1       3       7       9       2
>>> 5       3       3       339     2       6       0       1       1       2
>>>    1         |  405         m     = sci.electronics
>>> 0       4       0       1       0       0       0       1       0
>>> 1       1       0       3       367     3       1       2       0       0
>>>    0         |  384         n     = sci.med
>>> 0       1       2       0       1       0       2       0       0
>>> 1       0       0       1       1       375     0       1       0       0
>>>    0         |  385         o     = sci.space
>>> 4       2       1       1       0       0       1       1       2
>>> 0       0       1       1       5       1       367     4       0       1
>>>    1         |  393         p     = soc.religion.christian
>>> 0       1       0       0       0       0       0       0       0
>>> 2       0       0       0       0       0       2       378     0       1
>>>    0         |  384         q     = talk.politics.mideast
>>> 0       0       0       0       0       2       1       1       1
>>> 1       0       3       0       3       0       0       2       319     2
>>>    4         |  339         r     = talk.politics.guns
>>> 32      0       0       1       0       0       0       0       0
>>> 1       1       1       0       2       2       26      5       7       175
>>>    6         |  259         s     = talk.religion.misc
>>> 0       0       0       2       0       0       0       0       0
>>> 1       2       2       0       1       2       1       10      18      2
>>>    278       |  319         t     = talk.politics.misc
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa                                       0.8594
>>> Accuracy                                   89.3071%
>>> Reliability                                 84.611%
>>> Reliability (standard deviation)            0.2148
>>>
>>> Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 10840 ms (Minutes: 0.18066666666666667)
>>>
>>>
>>> *Naive bayes:*
>>> INFO: Standard NB Results:
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances          :      11286       99.0869%
>>> Incorrectly Classified Instances        :        104        0.9131%
>>> Total Classified Instances              :      11390
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a       b       c       d       e       f       g       h       i
>>> j       k       l       m       n       o       p       q       r       s
>>>    t        <--Classified as
>>> 474     0       0       0       0       0       0       0       0
>>> 0       0       0       0       0       0       0       0       0       2
>>>    1         |  477         a     = alt.atheism
>>> 0       566     0       2       0       1       0       0       0
>>> 0       0       0       0       0       0       0       0       0       0
>>>    0         |  569         b     = comp.graphics
>>> 0       10      590     29      2       4       1       0       0
>>> 0       0       0       1       0       0       0       0       0       0
>>>    1         |  638         c     = comp.os.ms-windows.misc
>>> 0       0       0       596     0       0       0       0       0
>>> 0       0       0       0       0       0       0       0       0       0
>>>    0         |  596         d     = comp.sys.ibm.pc.hardware
>>> 0       0       0       0       575     0       1       0       0
>>> 0       0       0       1       0       0       0       0       0       0
>>>    0         |  577         e     = comp.sys.mac.hardware
>>> 0       2       2       2       0       593     1       0       0
>>> 0       0       0       0       0       1       0       0       0       0
>>>    0         |  601         f     = comp.windows.x
>>> 0       0       0       1       0       0       589     1       0
>>> 0       1       0       2       0       0       0       0       0       0
>>>    0         |  594         g     = misc.forsale
>>> 0       0       0       0       0       0       0       594     0
>>> 0       0       0       0       0       0       0       0       0       0
>>>    0         |  594         h     = rec.autos
>>> 0       0       0       0       0       0       0       0       611
>>> 0       0       0       0       0       0       0       0       0       0
>>>    0         |  611         i     = rec.motorcycles
>>> 0       0       0       0       0       0       0       0       0
>>> 616     1       0       0       0       0       0       0       0       0
>>>    0         |  617         j     = rec.sport.baseball
>>> 0       0       0       0       0       0       1       0       0
>>> 0       620     0       0       0       0       0       0       0       0
>>>    0         |  621         k     = rec.sport.hockey
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       580     0       0       0       0       0       1       0
>>>    0         |  581         l     = sci.crypt
>>> 0       0       0       3       1       0       0       0       0
>>> 0       0       0       571     0       0       0       0       0       0
>>>    0         |  575         m     = sci.electronics
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       0       2       583     0       0       0       0       0
>>>    0         |  585         n     = sci.med
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       0       0       1       599     0       0       0       0
>>>    0         |  600         o     = sci.space
>>> 0       1       0       0       0       0       0       0       0
>>> 0       0       0       0       0       0       615     0       0       0
>>>    0         |  616         p     = soc.religion.christian
>>> 1       0       0       0       0       0       0       0       0
>>> 0       0       0       0       0       0       1       560     0       0
>>>    0         |  562         q     = talk.politics.mideast
>>> 0       0       1       0       0       0       0       0       0
>>> 0       0       1       0       0       0       0       0       548     0
>>>    1         |  551         r     = talk.politics.guns
>>> 10      0       0       0       0       0       0       0       0
>>> 0       0       0       0       0       1       1       0       2       344
>>>    1         |  359         s     = talk.religion.misc
>>> 0       0       0       0       0       0       0       0       0
>>> 0       0       1       1       0       0       0       0       2       0
>>>    462       |  466         t     = talk.politics.misc
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa                                       0.9847
>>>  Accuracy                                   99.0869%
>>> Reliability                                94.3334%
>>> Reliability (standard deviation)            0.2169
>>>
>>> Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 14304 ms (Minutes: 0.2384)
>>> + echo 'Testing on holdout set'
>>> Testing on holdout set
>>>
>>> [snip]
>>>
>>> INFO: Standard NB Results:
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances          :       6718       90.1019%
>>> Incorrectly Classified Instances        :        738        9.8981%
>>> Total Classified Instances              :       7456
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a       b       c       d       e       f       g       h       i
>>> j       k       l       m       n       o       p       q       r       s
>>>    t        <--Classified as
>>> 294     0       0       0       0       0       0       0       0
>>> 0       0       2       0       1       1       6       1       1       16
>>>     0         |  322         a     = alt.atheism
>>> 0       345     6       14      6       11      6       0       0
>>> 0       0       5       7       1       3       0       0       0       0
>>>    0         |  404         b     = comp.graphics
>>> 2       29      177     78      22      19      9       1       0
>>> 0       0       4       2       0       1       1       0       0       1
>>>    1         |  347         c     = comp.os.ms-windows.misc
>>> 1       9       2       335     18      2       10      0       0
>>> 0       1       0       8       0       0       0       0       0       0
>>>    0         |  386         d     = comp.sys.ibm.pc.hardware
>>> 1       4       2       13      347     3       5       1       0
>>> 0       1       0       7       1       0       0       0       1       0
>>>    0         |  386         e     = comp.sys.mac.hardware
>>> 0       20      0       4       0       352     4       0       0
>>> 0       0       0       1       1       3       0       1       0       1
>>>    0         |  387         f     = comp.windows.x
>>> 0       2       0       21      5       1       323     7       2
>>> 2       0       2       12      0       3       0       0       0       0
>>>    1         |  381         g     = misc.forsale
>>> 0       1       0       0       1       0       15      363     8
>>> 1       0       0       4       1       0       0       0       1       0
>>>    1         |  396         h     = rec.autos
>>> 0       1       0       0       0       0       6       6       370
>>> 0       0       0       0       1       0       0       0       0       1
>>>    0         |  385         i     = rec.motorcycles
>>> 1       0       0       1       1       0       2       1       2
>>> 362     5       0       2       0       0       0       0       0       0
>>>    0         |  377         j     = rec.sport.baseball
>>> 0       0       0       1       2       0       0       0       0
>>> 3       371     0       0       0       0       0       0       0       0
>>>    1         |  378         k     = rec.sport.hockey
>>> 0       3       1       0       1       0       2       0       0
>>> 0       0       396     0       1       0       0       1       1       1
>>>    3         |  410         l     = sci.crypt
>>> 0       7       0       7       7       2       6       4       0
>>> 0       0       1       369     2       2       0       0       0       0
>>>    2         |  409         m     = sci.electronics
>>> 0       3       0       2       1       0       2       0       0
>>> 0       0       1       4       383     4       0       0       1       0
>>>    4         |  405         n     = sci.med
>>> 0       5       0       0       1       0       3       0       0
>>> 0       0       0       1       0       374     1       0       0       1
>>>    1         |  387         o     = sci.space
>>> 6       2       0       1       1       0       0       1       0
>>> 1       0       0       1       5       0       352     2       1       7
>>>    1         |  381         p     = soc.religion.christian
>>> 1       1       0       0       0       0       0       0       0
>>> 0       1       0       0       0       0       0       373     1       0
>>>    1         |  378         q     = talk.politics.mideast
>>> 0       0       0       0       0       0       1       0       1
>>> 0       0       2       0       0       0       0       0       346     2
>>>    7         |  359         r     = talk.politics.guns
>>> 26      1       0       1       0       0       0       2       0
>>> 1       1       0       0       1       1       20      2       6       200
>>>    7         |  269         s     = talk.religion.misc
>>> 1       0       0       0       0       0       0       2       0
>>> 0       1       0       0       2       2       0       1       14      0
>>>    286       |  309         t     = talk.politics.misc
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa                                       0.8726
>>> Accuracy                                   90.1019%
>>> Reliability                                85.4491%
>>> Reliability (standard deviation)            0.2222
>>>
>>> Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 10878 ms (Minutes: 0.1813)
>>>
>>> *SGD:*
>>> 7532 test files
>>>
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances          :       5649            75%
>>> Incorrectly Classified Instances        :       1883            25%
>>> Total Classified Instances              :       7532
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a       b       c       d       e       f       g       h       i
>>> j       k       l       m       n       o       p       q       r       s
>>>    t        <--Classified as
>>> 186     6       3       10      5       0       33      4       13
>>>  15      7       1       24      15      3       15      5       5       29
>>>     15        |  394         a     = sci.space
>>> 5       309     0       3       2       5       0       0       0
>>> 1       9       21      2       0       0       18      4       4       1
>>>    1         |  385         b     = comp.sys.mac.hardware
>>> 4       1       101     3       0       1       63      0       7
>>> 0       1       1       5       16      3       0       3       7       1
>>>    34        |  251         c     = talk.religion.misc
>>> 11      12      1       265     1       10      3       0       0
>>> 17      10      11      5       2       0       11      3       6       21
>>>     0         |  389         d     = comp.graphics
>>> 2       1       1       0       349     2       3       0       3
>>> 2       6       1       5       1       0       2       15      2       1
>>>    2         |  398         e     = rec.motorcycles
>>> 7       20      3       19      2       254     6       0       2
>>> 11      2       39      7       2       0       4       2       2       9
>>>    3         |  394         f     = comp.os.ms-windows.misc
>>> 2       1       13      0       0       0       247     0       1
>>> 1       3       0       6       2       4       0       2       3       5
>>>    29        |  319         g     = alt.atheism
>>> 1       1       0       0       2       0       2       361     0
>>> 1       2       0       2       0       0       1       3       22      0
>>>    1         |  399         h     = rec.sport.hockey
>>> 3       0       3       1       0       0       5       0       161
>>> 0       1       2       12      102     0       0       1       2       11
>>>     6         |  310         i     = talk.politics.misc
>>> 2       8       0       19      0       19      0       0       1
>>> 294     10      11      4       2       0       5       0       3       11
>>>     6         |  395         j     = comp.windows.x
>>> 2       10      0       1       1       0       0       0       0
>>> 1       347     13      2       1       0       5       3       2       2
>>>    0         |  390         k     = misc.forsale
>>> 1       36      0       6       1       25      0       0       1
>>> 6       10      257     2       1       0       34      6       0       6
>>>    0         |  392         l     = comp.sys.ibm.pc.hardware
>>> 2       2       2       2       1       0       12      0       0
>>> 6       10      4       312     5       2       13      11      3       3
>>>    6         |  396         m     = sci.med
>>> 2       0       3       2       1       0       0       1       13
>>>  0       5       1       2       314     2       0       2       2       10
>>>     4         |  364         n     = talk.politics.guns
>>> 1       0       2       1       1       0       34      1       33
>>>  1       3       0       1       8       271     1       4       5       6
>>>      3         |  376         o     = talk.politics.mideast
>>> 3       14      0       8       2       8       3       1       1
>>> 7       12      29      6       2       1       245     13      2       32
>>>     4         |  393         p     = sci.electronics
>>> 3       3       0       2       11      0       1       0       2
>>> 1       11      6       4       2       0       11      330     4       4
>>>    1         |  396         q     = rec.autos
>>> 0       0       1       0       1       0       4       12      3
>>> 1       3       0       0       0       0       5       6       359     1
>>>    1         |  397         r     = rec.sport.baseball
>>> 0       1       0       0       0       1       0       0       3
>>> 3       0       0       3       2       1       6       1       6       366
>>>    3         |  396         s     = sci.crypt
>>> 0       2       11      1       1       0       40      0       1
>>> 2       3       4       2       1       0       5       0       2       2
>>>    321       |  398         t     = soc.religion.christian
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa                                       0.7073
>>> Accuracy                                        75%
>>> Reliability                                70.6238%
>>> Reliability (standard deviation)            0.2187
>>> Log-likelihood                mean      :    -1.1182
>>>                               25%-ile   :    -1.6911
>>>                               75%-ile   :    -0.0803
>>>
>>> Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 10783 ms (Minutes: 0.17971666666666666)
>>>
>>>
>>>
>>>
>>> On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <su...@yahoo.com>wrote:
>>>
>>>> Thanks Andrew for reporting that. I rolled back the release to fix this
>>>> and few other issues.
>>>>
>>>> We have removed asf-examples*.sh from trunk as the sample file at the
>>>> url mentioned in ur email is not available.
>>>> This is something we need to fix and restore in 1.0.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <
>>>> ap.dev@outlook.com> wrote:
>>>>
>>>> from the asf-email-examples.sh script:
>>>>
>>>> # You will need to download or otherwise obtain some or all of the
>>>> Amazon ASF Em
>>>> ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566)
>>>> to use this
>>>> script.
>>>> # To obtain a full copy you will need to launch an EC2 instance and
>>>> mount the da
>>>> taset to download it, otherwise you can get a sample of it at
>>>> #
>>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>>>>
>>>> It looks like the:
>>>>
>>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>>>>
>>>> link is down.
>>>>
>>>> Is there somewhere else that we can get a subset of the ASF emails?
>>>>
>>>>
>>>>
>>>> Date: Tue, 21 Jan 2014 09:48:06 -0800
>>>> > Subject: Re: MAHOUT 0.9 Release - New URL
>>>> > From: andrew.musselman@gmail.com
>>>> > To: dev@mahout.apache.org
>>>> >
>>>> > Sure thing; continuing to smoke test the other examples tonight
>>>> >
>>>> >
>>>> > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <
>>>> suneel_marthi@yahoo.com>wrote:
>>>> >
>>>> > > Thanks Andrew M., see that some of the example scripts need to be
>>>> fixed as
>>>> > > they still refer to the deprecated algorithms.
>>>> > > See that the Streaming KMeans has failed for you as well.
>>>> > >
>>>> > > I'll be rolling back the release today to fix these issues.
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
>>>> > > andrew.musselman@gmail.com> wrote:
>>>> > >
>>>> > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default
>>>> 64-bit
>>>> > > Linux AMI from tarball.
>>>> > >
>>>> > > All tests pass.
>>>> > >
>>>> > > *Output of examples:*
>>>> > > *asf-email-examples.sh, run on mahout.apache.org
>>>> > > <http://mahout.apache.org>:*
>>>> > > *recommendations:*
>>>> > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
>>>> > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000  | less
>>>> > > 1
>>>> > >
>>>> > >
>>>> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
>>>> > > 4
>>>> > >
>>>> > >
>>>> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
>>>> > > 6
>>>> > >
>>>> > >
>>>> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
>>>> > > 8
>>>> > >     [12758:1.0,19409:1.0,11112:1.0]
>>>> > > 11
>>>> > >
>>>> > >
>>>> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
>>>> > > 14
>>>> > >
>>>> > >
>>>> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
>>>> > > 15
>>>> > >
>>>> > >
>>>> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
>>>> > > 16
>>>> > >
>>>> > >
>>>> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
>>>> > > 18
>>>> > >
>>>> > >
>>>> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
>>>> > > 19      [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
>>>> > > 20
>>>> > >
>>>> > >
>>>> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
>>>> > > [snip]
>>>> > >
>>>> > > *clustering; kmeans:*
>>>> > > [snip]
>>>> > >         Weight : [props - optional]:  Point:
>>>> > >         1.0 :
>>>> > >  [distance-squared=1.0193102046188427]:
>>>> > > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus=
>>>> > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
>>>> 7573:0.204,
>>>> > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093,
>>>> 9779:0.159,
>>>> > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
>>>> > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
>>>> > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
>>>> > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
>>>> > > 39789:0.110, 40743:0.190, 45775:0.086]
>>>> > >         1.0 : [distance-squared=0.9823018320457279]:
>>>> > > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus=
>>>> > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
>>>> 5336:0.106,
>>>> > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173,
>>>> 7832:0.072,
>>>> > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
>>>> > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
>>>> > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
>>>> > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
>>>> > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
>>>> > >         1.0 : [distance-squared=0.9509142993214911]:
>>>> > >
>>>> /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
>>>> > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
>>>> > >  4419:0.076,
>>>> > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056,
>>>> 7235:0.048,
>>>> > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123,
>>>> 7683:0.077,
>>>> > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
>>>> 10225:0.081,
>>>> > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
>>>> > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
>>>> > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
>>>> > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
>>>> > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
>>>> > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
>>>> > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
>>>> > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
>>>> > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
>>>> > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
>>>> > > 41280:0.065, 41696:0.072, 41947:0.118,
>>>> > >  43685:0.086, 44077:0.308,
>>>> > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
>>>> > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
>>>> > > [snip]
>>>> > >
>>>> > > *clustering; dirichlet:*
>>>> > > Get this complaint:
>>>> > > Running Dirichlet with K = 8
>>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
>>>> > > HADOOP_CONF_DIR=
>>>> > > MAHOUT-JOB:
>>>> > >
>>>> > >
>>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class:
>>>> dirichlet
>>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props
>>>> found on
>>>> > > classpath, will use command-line arguments only
>>>> > > Unknown program 'dirichlet' chosen.
>>>> > >
>>>> > > *clustering: minhash:*
>>>> > > Running Minhash
>>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
>>>> > > HADOOP_CONF_DIR=
>>>> > > MAHOUT-JOB:
>>>> > >
>>>> > >
>>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>>>> > > 14/01/21 05:17:27 WARN
>>>> > >  driver.MahoutDriver: Unable to add class: minhash
>>>> > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found
>>>> on
>>>> > > classpath, will use command-line arguments only
>>>> > > Unknown program 'minhash' chosen.
>>>> > >
>>>> > > *classification; standard:*
>>>> > > =======================================================
>>>> > > Summary
>>>> > > -------------------------------------------------------
>>>> > > Correctly Classified Instances          :       5384       87.7874%
>>>> > > Incorrectly Classified Instances        :        749       12.2126%
>>>> > > Total Classified Instances              :       6133
>>>> > >
>>>> > > =======================================================
>>>> > > Confusion Matrix
>>>> > > -------------------------------------------------------
>>>> > > a       b       c       d
>>>> > >     <--Classified as
>>>> > > 2949    7       531     25       |  3512        a     = dev
>>>> > > 0       0       0       0        |  0           b     = general
>>>> > > 99      8       1763    8        |  1878        c     = user
>>>> > > 41      1       29      672      |  743         d     = commits
>>>> > >
>>>> > > =======================================================
>>>> > > Statistics
>>>> > > -------------------------------------------------------
>>>> > > Kappa
>>>> > >  0.7877
>>>> > > Accuracy                                   87.7874%
>>>> > > Reliability                                 53.658%
>>>> > > Reliability (standard deviation)            0.4911
>>>> > >
>>>> > > *classification; complementary:*
>>>> > > =======================================================
>>>> > > Summary
>>>> > > -------------------------------------------------------
>>>> > > Correctly Classified Instances          :       5530       90.1679%
>>>> > > Incorrectly Classified Instances        :        603        9.8321%
>>>> > > Total Classified Instances              :
>>>> > >  6133
>>>> > >
>>>> > > =======================================================
>>>> > > Confusion Matrix
>>>> > > -------------------------------------------------------
>>>> > > a       b       c       d       <--Classified as
>>>> > > 3168    0       276     68       |  3512        a     = dev
>>>> > > 0       0       0       0        |  0           b     = general
>>>> > > 196     0       1652    30       |  1878        c     = user
>>>> > > 25      0       8       710      |  743         d     =
>>>> > >  commits
>>>> > >
>>>> > > =======================================================
>>>> > > Statistics
>>>> > > -------------------------------------------------------
>>>> > > Kappa                                       0.8259
>>>> > > Accuracy                                   90.1679%
>>>> > > Reliability                                54.7459%
>>>> > > Reliability (standard deviation)            0.5005
>>>> > >
>>>> > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms
>>>> (Minutes:
>>>> > > 0.34836666666666666)
>>>> > >
>>>> > > *classification; sgd, with three categories:*
>>>> > > Running SGD Training
>>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
>>>> > >  and
>>>> > > HADOOP_CONF_DIR=
>>>> > > MAHOUT-JOB:
>>>> > >
>>>> > >
>>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>>>> > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
>>>> > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
>>>> classpath,
>>>> > > will use command-line arguments only
>>>> > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
>>>> > > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
>>>> > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
>>>> > > --output=[asf-output/classification/sgd/models], --poolSize=[5],
>>>> > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
>>>> > > 24168 training files
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       1
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>>>> > >  2
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       3
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       4
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       6
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       8
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       10
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00
>>>> > >    0.00    0.00    0.0000000       0.0000000       12
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       15
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       20
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       25
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       30
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000
>>>> > >     0.0000000       40
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       50
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       60
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       70
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       80
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       100
>>>> > > 0.000
>>>> > >  0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       120
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       140
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       150
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       200
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       250
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00
>>>> > >  0.00    0.00    0.0000000       0.0000000       300
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       400
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       500
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       600
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       700
>>>> > > 0.000   0.00    none
>>>> > > 0.00    0.00    0.00    0.00    0.0000000
>>>> > >  0.0000000       800
>>>> > > 0.000   0.00    none
>>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
>>>> > > 1.0019413e-08   1000    -0.607  75.78   none
>>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
>>>> > > 1.0019413e-08   1200    -0.607  75.78   none
>>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
>>>> > > 1.0019413e-08   1400    -0.607  75.78   none
>>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
>>>> > > 1.0019413e-08   1500    -0.607  75.78   none
>>>> > > 0.24    43686.00        17924.00        329.50
>>>> > >  1.0571799e-08
>>>> > > 1.0032261e-08   2000    -0.487  82.65   none
>>>> > > 0.24    49753.00        21610.00        330.71  1.3770070e-08
>>>> > > 1.0011902e-08   2500    -0.439  83.90   none
>>>> > > 0.24    49753.00        21610.00        330.71  1.3770070e-08
>>>> > > 1.0011902e-08   3000    -0.439  83.90   none
>>>> > > 0.32    50635.00        28531.00        437.09  1.0551175e-08
>>>> > > 1.0000001e-08   4000    -0.351  88.14   none
>>>> > > 0.32    50635.00        32642.00        437.09  1.0551175e-08
>>>> > > 1.0000000e-08   5000    -0.378  87.10   none
>>>> > > 0.32    50635.00        36461.00        437.09
>>>> > >  1.0556652e-08
>>>> > > 1.0000001e-08   6000    -0.372  86.89   none
>>>> > > 0.32    50635.00        37768.00        437.09  1.0576742e-08
>>>> > > 1.0000001e-08   7000    -0.334  89.26   none
>>>> > > 0.32    50635.00        38807.00        437.09  1.0576742e-08
>>>> > > 1.0000000e-08   8000    -0.368  87.52   none
>>>> > > 0.32    50635.00        44731.00        437.09  1.0576716e-08
>>>> > > 1.0000000e-08   10000   -0.374  87.39   none
>>>> > > 0.32    50635.00        45672.00        437.09  1.0576716e-08
>>>> > > 1.0000000e-08   12000   -0.298  88.26   none
>>>> > > Exception in thread "main" java.lang.IllegalStateException:
>>>> > > java.lang.ArrayIndexOutOfBoundsException:
>>>> > >  2
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
>>>> > >         at
>>>> > >
>>>> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
>>>> > >         at
>>>> > >
>>>> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
>>>> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>> Method)
>>>> > >         at
>>>> > >
>>>> > >
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>> > >
>>>> > >  at
>>>> > >
>>>> > >
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> > >         at java.lang.reflect.Method.invoke(Method.java:622)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>>> > >         at
>>>> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>> > >         at
>>>> > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>>>> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>> Method)
>>>> > >         at
>>>> > >
>>>> > >
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>> > >         at
>>>> > >
>>>> > >
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> > >         at java.lang.reflect.Method.invoke(Method.java:622)
>>>> > >         at
>>>> > >  org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>>>> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
>>>> > >         at
>>>> > > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
>>>> > >
>>>> > >  at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
>>>> > >         at
>>>> > >
>>>> > >
>>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
>>>> > >         at
>>>> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>> > >         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>> > >         at
>>>> > >
>>>> > >
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>>>> > >         at
>>>> > >
>>>> > >
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> > >         at java.lang.Thread.run(Thread.java:701)
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
>>>> > > andrew.musselman@gmail.com> wrote:
>>>> > >
>>>> > > > Trying out the build today
>>>> > > >
>>>> > > >
>>>> > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
>>>> suneel_marthi@yahoo.com
>>>> > > >wrote:
>>>> > > >
>>>> > > >> This is an issue (trivial one though) that needs to be fixed for
>>>> 0.9
>>>> > > >> Release, will be rerolling the release today (in the next few
>>>> hrs) and
>>>> > > >> putting out a new release candidate in staging.
>>>> > > >>
>>>> > > >> Thanks for reporting this Andrew P.
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
>>>> > > ap.dev@outlook.com>
>>>> > > >> wrote:
>>>> > > >>
>>>> > > >> I ran through the tests with on a CentOS VM
>>>> > >  AMD64 2 cores 4 GB RAM.  Had
>>>> > > >> a bit of trouble getting the Hadoop natives to compile and
>>>> therefore may
>>>> > > >> have run into some problems because of the hadoop setup.  Ran
>>>> into some
>>>> > > >> problems in the example scripts.  Particularly with
>>>> > > >> ./cluster-syntheticcontrol.sh ->4,5.  I will run through the
>>>> rest of the
>>>> > > >> examples when im sure I've got hadoop setup right.
>>>> > > >>
>>>> > > >>
>>>> > > >> Apache Maven 3.1.2-SNAPSHOT
>>>> > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
>>>> > > >> Java home: /usr/java/jdk1.6.0_45/jre
>>>> > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch:
>>>> "amd64",
>>>> > > >> family: "unix"
>>>> > > >> $MAHOUT_LOCAL=true
>>>> > > >> Hadoop 2.2.0
>>>> > > >>
>>>> > > >>
>>>> > > >> a) Verify that u can unpack the release (tar or zip) ...passed
>>>> (tar)
>>>> > > >> [passed ]
>>>> > > >>
>>>> > > >> b) Verify u r able to compile the
>>>> > >  distro
>>>> > > >>
>>>> > > >>     mvn compile- [passed with warnings]
>>>> > > >>
>>>> > > >>     [WARNING]  Expected all dependencies to require Scala
>>>> version: 2.9.3
>>>> > > >>     [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires
>>>> scala
>>>> > > >> version: 2.9.3
>>>> > > >>     [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala
>>>> > > >> version: 2.9.2
>>>> > > >>     [WARNING] Multiple versions of scala libraries detected!
>>>> > > >>
>>>> > > >> c)  Run through the unit tests: mvn clean test
>>>> > > >>     mvn clean test [passed]
>>>> > > >>
>>>> > > >> d) Run the
>>>> > > >>  example scripts under $MAHOUT_HOME/examples/bin.
>>>> > > >> Please run through all the different options in each script
>>>> > > >>
>>>> > > >>     Running example scripts with $MAHOUT_LOCAL=true
>>>> > > >>
>>>> > > >>
>>>> > >  ./cluster-syntheticcontrol.sh ->1 [works]
>>>> > > >>     ./cluster-syntheticcontrol.sh ->2 [works]
>>>> > > >>     ./cluster-syntheticcontrol.sh ->3 [works]
>>>> > > >>
>>>> > > >>
>>>> > > >>     ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
>>>> > > >>     [...]
>>>> > > >>     WARNING: Unable to add class:
>>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>>>> > > >>     java.lang.ClassNotFoundException:
>>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>>>> > > >>         at
>>>> > > >>  java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>> > > >>         at java.security.AccessController.doPrivileged(Native
>>>> Method)
>>>> > > >>         at
>>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>> > > >>         at
>>>> > >  java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>>> > > >>         at
>>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>>> > > >>         at java.lang.Class.forName0(Native Method)
>>>> > > >>         at java.lang.Class.forName(Class.java:171)
>>>> > > >>         at
>>>> > > >>
>>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>>>> > > >>         at
>>>> > > >>
>>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>>>> > > >>     Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
>>>> > > >>
>>>> > > >>
>>>> > > >>     ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
>>>> > > >>
>>>> > > >>     WARNING: Unable to add class:
>>>> > > >>
>>>> > >  org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>>>> > > >>     java.lang.ClassNotFoundException:
>>>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>>>> > > >>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>> > > >>         at java.security.AccessController.doPrivileged(Native
>>>> Method)
>>>> > > >>         at
>>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>>> > > >>         at
>>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>>> > > >>         at java.lang.Class.forName0(Native Method)
>>>> > > >>         at
>>>> > >  java.lang.Class.forName(Class.java:171)
>>>> > > >>         at
>>>> > > >>
>>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>>>> > > >>         at
>>>> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>>>> > > >>     Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
>>>> > > >>     WARNING: No
>>>> > > >>
>>>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found
>>>> > > on
>>>> > > >> classpath, will use command-line arguments only
>>>> > > >>     Unknown program
>>>> > > >>  'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
>>>> chosen.
>>>> > > >>
>>>> > > >>
>>>> > > >>     ./classify-20newsgroups.sh ->1 [works]
>>>> > > >>     ./classify-20newsgroups.sh ->2 [works]
>>>> > > >>
>>>> > > >>
>>>> > > >>     cluster-reuters.sh ->1 [works]
>>>> > > >>
>>>> > >  cluster-reuters.sh ->2 [works]
>>>> > > >>     cluster-reuters.sh ->3 [works]
>>>> > > >>
>>>> > > >>     Same error as noted previosly in the thread:
>>>> > > >>
>>>> > > >>     cluster-reuters.sh ->4 [0 clusters]
>>>> > > >>
>>>> > > >>     [...]
>>>> > > >>
>>>> > > >>     WARNING: No qualcluster.props found on classpath, will use
>>>> > > >> command-line arguments only
>>>> > > >>     Num clusters: 0; maxDistance: 0.000000
>>>> > > >>     [Dunn Index]
>>>> > > >>  First: Infinity
>>>> > > >>     [Davies-Bouldin Index] First: NaN
>>>> > > >>     Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
>>>> > > >>     INFO: Program took 669 ms (Minutes: 0.01115)
>>>> > > >>     cluster,distance.mean,distance.sd
>>>> > > >>
>>>> > >
>>>> > >
>>>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
>>>> > > >> > From: suneel_marthi@yahoo.com
>>>> > > >> > Subject: MAHOUT 0.9 Release - New URL
>>>> > > >> > To: user@mahout.apache.org; dev@mahout.apache.org
>>>> > > >> >
>>>> > > >> > Third time's a Charm!!!
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > Here's the new URL for Mahout 0.9 Release:
>>>> > > >> >
>>>> > > >>
>>>> > >
>>>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>>>> > > >> >
>>>> > > >> > For those volunteering to test this, some of the things to be
>>>> > > verified:
>>>> > > >> >
>>>> > > >> > a) Verify that u can unpack the release (tar or zip)
>>>> > > >> > b) Verify u r able to compile the distro
>>>> > > >> > c)  Run through the unit tests: mvn clean test
>>>> > > >> > d) Run the example scripts
>>>> > > >>  under $MAHOUT_HOME/examples/bin. Please run through all the
>>>> different
>>>> > > >> options in each script.
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > Committers
>>>> > > >> >  and PMC members:
>>>> > > >> > ---------------------------------------
>>>> > > >> >
>>>> > > >> > Need 'at least 3 +1 votes' for the Release to pass.
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > Thanks and
>>>> > >  Regards.
>>>> > > >>
>>>> > > >
>>>> > > >
>>>> > >
>>>>
>>>
>>>
>>
>

Re: MAHOUT 0.9 Release - New URL

Posted by Andrew Musselman <an...@gmail.com>.
*clustering-syntheticcontrol.sh*

*Canopy:*
[snip]
        1.0 : [distance-squared=1740.681000315628]: [35.486, 25.600,
29.914, 30.200, 27.654, 28.647, 26.582, 32.941, 34.728, 29.047, 34.706,
34.764, 28.816, 30.271, 25.784, 25.035, 35.436, 29.075, 24.267, 24.625,
28.162, 26.218, 28.773, 28.966, 28.802, 34.292, 26.211, 33.363, 32.920,
31.891, 34.504, 32.686, 24.327, 35.981, 31.390, 10.832, 20.238, 10.051,
14.877, 10.570, 19.603, 14.544, 10.667, 16.470, 19.007, 10.352, 13.473,
12.196, 10.684, 16.620, 20.434, 17.069, 18.744, 9.599, 11.195, 12.002,
10.017, 17.149, 14.850, 10.890]
        1.0 : [distance-squared=1455.363773097357]: [31.022, 28.140,
26.730, 26.570, 29.561, 26.966, 28.049, 25.673, 33.721, 26.275, 30.410,
31.101, 24.019, 35.659, 25.253, 25.932, 28.618, 32.423, 33.666, 33.745,
35.118, 29.164, 25.477, 31.947, 35.491, 30.730, 25.820, 24.651, 25.528,
31.343, 29.005, 31.825, 26.891, 28.194, 31.429, 16.935, 8.070, 16.604,
14.743, 10.342, 8.155, 10.395, 17.689, 16.791, 14.138, 15.761, 6.787,
13.062, 16.660, 15.021, 9.891, 9.216, 11.550, 8.877, 18.220, 9.477, 10.342,
16.430, 11.898, 15.366]
        1.0 : [distance-squared=1679.9304895378882]: [29.625, 25.503,
31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
15.285, 22.528, 20.657, 24.129]
        1.0 : [distance-squared=2044.2887801683828]: [27.414, 25.397,
26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
20.229, 11.131, 9.980, 10.720]
        1.0 : [distance-squared=1385.3154063160764]: [35.899, 26.672,
34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
16.546, 15.927, 18.084, 17.475]
        1.0 : [distance-squared=1920.6376615603585]: [24.538, 24.280,
28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
19.310, 12.999, 17.460]
        1.0 : [distance-squared=2192.939571172661]: [34.335, 30.938,
31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
11.743, 11.699, 10.152]
Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: Wrote 6 clusters
Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 5827 ms (Minutes: 0.09711666666666667)

*K-means:*
[snip]
        1.0 : [distance-squared=2873.881301031739]: [26.369, 37.791,
41.839, 39.694, 36.728, 35.079, 30.668, 24.755, 20.610, 18.885, 15.459,
14.074, 19.117, 34.230, 32.202, 40.715, 39.543, 37.193, 39.448, 30.829,
25.273, 22.324, 19.801, 13.505, 17.462, 24.287, 27.398, 38.577, 42.108,
41.947, 43.987, 41.331, 28.958, 23.664, 20.308, 22.244, 12.149, 15.768,
18.964, 32.579, 33.202, 36.205, 42.364, 40.601, 35.415, 39.576, 33.145,
19.936, 23.062, 19.053, 24.383, 19.611, 25.218, 38.860, 36.570, 38.964,
36.946, 36.900, 32.593, 31.563]
        1.0 : [distance-squared=2525.8924241648783]: [35.389, 31.178,
40.041, 43.034, 49.524, 40.942, 42.369, 30.153, 26.253, 23.178, 19.885,
19.693, 21.837, 26.858, 23.533, 29.798, 43.401, 43.349, 48.238, 43.868,
44.017, 28.056, 25.284, 22.737, 13.703, 14.023, 22.115, 23.720, 25.478,
37.870, 37.868, 46.954, 43.135, 41.286, 37.269, 29.347, 24.312, 21.743,
19.152, 21.668, 10.882, 16.147, 30.020, 28.472, 38.716, 44.620, 47.847,
48.846, 39.361, 38.449, 33.105, 19.935, 14.961, 12.149, 12.630, 13.459,
18.967, 35.473, 30.146, 45.527]
        1.0 : [distance-squared=2392.7171990886272]: [27.662, 37.199,
39.158, 44.264, 46.473, 40.178, 38.728, 24.412, 25.155, 15.938, 13.125,
16.867, 13.875, 29.969, 34.322, 40.870, 44.225, 47.594, 47.607, 44.751,
39.813, 32.461, 16.646, 16.305, 21.256, 20.627, 19.267, 31.901, 34.995,
43.643, 43.152, 47.125, 48.524, 33.131, 32.697, 20.300, 14.350, 18.330,
12.155, 11.261, 19.701, 29.313, 30.457, 39.914, 49.543, 49.851, 46.445,
39.561, 28.860, 22.486, 13.967, 10.006, 11.715, 15.945, 21.348, 25.439,
41.024, 37.105, 45.623, 43.589]
        1.0 : [distance-squared=1419.8378244373016]: [25.784, 34.129,
42.659, 37.176, 35.961, 34.307, 32.108, 29.749, 25.047, 17.455, 24.652,
25.311, 22.995, 30.256, 25.955, 28.426, 34.556, 40.386, 39.642, 40.566,
32.612, 34.091, 26.033, 25.668, 26.545, 17.338, 24.980, 20.134, 27.542,
36.612, 31.855, 37.947, 39.736, 33.535, 36.607, 37.479, 32.612, 22.262,
20.662, 16.124, 24.547, 27.686, 21.747, 27.198, 31.259, 40.569, 37.067,
34.465, 34.730, 33.371, 23.060, 30.162, 22.022, 22.216, 14.812, 19.357,
24.508, 34.432, 32.155, 34.839]
        1.0 : [distance-squared=4186.814512311335]: [25.870, 39.195,
36.908, 47.052, 47.384, 40.741, 42.494, 30.282, 25.834, 17.650, 16.004,
17.895, 13.321, 19.045, 27.440, 31.911, 39.208, 43.622, 41.567, 44.815,
44.921, 35.422, 35.477, 23.190, 17.859, 14.684, 23.504, 23.141, 21.746,
30.816, 31.361, 37.015, 38.094, 46.688, 47.681, 43.777, 39.652, 31.701,
23.767, 22.265, 22.654, 22.327, 19.195, 21.163, 29.602, 27.563, 36.244,
38.859, 44.234, 42.352, 42.160, 40.172, 30.094, 21.092, 25.193, 13.096,
18.111, 14.754, 27.386, 27.026]
        1.0 : [distance-squared=1544.4011543572997]: [28.075, 41.784,
42.120, 38.735, 44.320, 34.316, 32.212, 31.868, 24.301, 14.547, 17.178,
22.279, 24.357, 31.011, 31.444, 34.837, 46.550, 48.301, 38.859, 42.363,
35.657, 31.499, 19.794, 12.124, 15.371, 12.436, 15.763, 24.679, 32.597,
43.004, 36.616, 38.935, 42.954, 34.957, 36.183, 28.177, 16.326, 14.988,
10.680, 22.728, 24.075, 24.058, 36.616, 43.982, 39.198, 40.118, 40.078,
34.752, 34.018, 23.750, 18.374, 12.251, 15.539, 18.699, 28.973, 24.044,
39.404, 38.034, 46.458, 44.432]
        1.0 : [distance-squared=825.9338725427806]: [33.670, 38.675,
39.742, 41.989, 37.291, 43.975, 31.909, 25.878, 31.080, 15.858, 13.950,
23.097, 19.983, 21.692, 31.579, 38.570, 33.376, 38.843, 41.936, 33.534,
39.195, 32.897, 25.343, 18.523, 15.089, 17.771, 22.614, 25.313, 23.687,
29.010, 41.995, 35.712, 40.872, 41.669, 32.156, 25.162, 24.980, 23.705,
18.413, 20.975, 14.906, 26.171, 30.165, 27.818, 35.083, 39.514, 37.851,
33.967, 32.338, 34.977, 26.589, 28.079, 19.597, 24.669, 23.098, 25.685,
28.215, 34.940, 36.910, 39.749]
Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: Wrote 6 clusters
Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 16902 ms (Minutes: 0.2817)

*Fuzzy k-means:*
[snip]
        1.0 : [distance-squared=971.7369782121968]: [29.625, 25.503,
31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
15.285, 22.528, 20.657, 24.129]
        1.0 : [distance-squared=2054.618163154475]: [27.414, 25.397,
26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
20.229, 11.131, 9.980, 10.720]
        1.0 : [distance-squared=954.6503560728597]: [35.899, 26.672,
34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
16.546, 15.927, 18.084, 17.475]
        1.0 : [distance-squared=2817.9170498632957]: [24.538, 24.280,
28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
19.310, 12.999, 17.460]
        1.0 : [distance-squared=3472.3684696871424]: [34.335, 30.938,
31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
11.743, 11.699, 10.152]
Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: Wrote 6 clusters
Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 106615 ms (Minutes: 1.7769166666666667)

*Dirichlet and Meanshift:*
Already detailed in M-1400, deprecated jobs still referenced.



On Tue, Jan 21, 2014 at 6:20 PM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:

> *cluster-reuters.sh*
> *k-means:*
>
> [snip]
> :VL-19482{n=913 c=[0.06:0.011, 0.1:0.007, 0.13:0.010, 0.25:0.016,
> 0.38:0.020, 0.4:0.007, 0.5:0.032, 0
>         Top Terms:
>                 banks                                   =>
> 3.841823268955143
>                 bank                                    =>
>  3.80633066361209
>                 debt                                    =>
>  3.28065219870794
>                 said                                    =>
>  2.5965700942088583
>                 he                                      =>
> 2.335682813857497
>                 foreign                                 =>
>  2.2217853688201403
>                 billion                                 =>
>  2.1970193848291335
>                 would                                   =>
>  1.9932392063955617
>                 loans                                   =>
>  1.9309276792854233
>                 interest                                =>
>  1.787324501938
>                 have                                    =>
> 1.762981951432578
>                 its                                     =>
>  1.7615109954971866
>                 which                                   =>
>  1.5822081148036862
>                 has                                     =>
>  1.5600708189041956
>                 dlrs                                    =>
>  1.5571038313005996
>                 finance                                 =>
>  1.5539758811252924
>                 new                                     =>
>  1.5176015811577555
>                 had                                     =>
>  1.5138723701401844
>                 brazil                                  =>
>  1.5083369853593172
>                 payments                                =>
>  1.4539044255886517
>         Weight : [props - optional]:  Point:
>
> :VL-7320{n=2726 c=[0:0.003, 0.1:0.010, 0.2:0.007, 0.3:0.009, 0.4:0.007,
> 0.40:0.003, 0.5:0.009, 0.57:0
>         Top Terms:
>                 vs                                      =>
> 6.126130791333171
>                 net                                     =>
> 4.012191567277523
>                 cts                                     =>
> 3.822006848832744
>                 shr                                     =>
>  3.6786004856764527
>                 mln                                     =>
>  2.9011643584038698
>                 loss                                    =>
> 2.788368861463607
>                 qtr                                     =>
> 2.714140225051522
>                 revs                                    =>
>  2.4739861236454717
>                 profit                                  =>
>  1.8146888090247015
>                 note                                    =>
>  1.7977163272138388
>                 dlrs                                    =>
>  1.6164390808155846
>                 avg                                     =>
>  1.3901765773336587
>                 shrs                                    =>
>  1.3856326531419314
>                 mths                                    =>
>  1.3168717272038506
>                 4th                                     =>
>  1.2161158425617289
>                 oper                                    =>
> 1.182419473776814
>                 year                                    =>
> 1.178086061733047
>                 nine                                    =>
>  1.0670554836445316
>                 3rd                                     =>
> 1.041334410056592
>                 inc                                     =>
>  1.0019361981554935
>         Weight : [props - optional]:  Point:
>
>
> Inter-Cluster Density: 0.45562152681859414
> Intra-Cluster Density: 0.6952712632167628
> CDbw Inter-Cluster Density: 0.0
> CDbw Intra-Cluster Density: 16.486930227598684
> CDbw Separation: 194.49005884464628
>
> *fuzzy k-means:*
> :SV-18539{n=1039 c=[0:0.026, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> 0.01:0.005, 0.02:0.002, 0.0
>         Top Terms:
>                 said                                    =>
>  1.8665592354713065
>                 its                                     =>
>  1.1335212213411592
>                 pct                                     =>
>  1.0862816801353348
>                 dlrs                                    =>
>  1.0854998884993752
>                 mln                                     =>
> 1.043163996400643
>                 from                                    =>
>  0.9684961110525736
>                 has                                     =>
> 0.912161511978058
>                 company                                 =>
>  0.8754186972808333
>                 mar                                     =>
>  0.8675333452422878
>                 inc                                     =>
>  0.7678617590362815
>                 would                                   =>
>  0.7610968883652675
>                 he                                      =>
>  0.7459988770503974
>                 which                                   =>
>  0.7435613119406804
>                 year                                    =>
>  0.7302840632748394
>                 u.s                                     =>
>  0.7281061062439116
>                 shares                                  =>
>  0.7260764102983083
>                 corp                                    =>
>  0.7179807367808658
>                 new                                     =>
>  0.7044203783157115
>                 stock                                   =>
>  0.6962010978721442
>                 have                                    =>
>  0.6464265467298506
> :SV-9431{n=1034 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> 0.01:0.004, 0.02:0.002, 0.02
>         Top Terms:
>                 said                                    =>
> 1.864911184196927
>                 dlrs                                    =>
> 1.199286689822081
>                 mln                                     =>
>  1.1802134783562215
>                 pct                                     =>
>  1.1529704214798124
>                 its                                     =>
>  1.1184398851519701
>                 from                                    =>
> 1.016647848050332
>                 company                                 =>
> 0.894703604722841
>                 mar                                     =>
> 0.879986159541356
>                 has                                     =>
>  0.8642799128491316
>                 year                                    =>
>  0.8271823503717782
>                 inc                                     =>
>  0.7871293745341424
>                 corp                                    =>
> 0.737705498468879
>                 which                                   =>
> 0.722975201852743
>                 would                                   =>
> 0.708000816484415
>                 u.s                                     =>
>  0.7073294276173905
>                 billion                                 =>
>  0.7055723996916351
>                 he                                      =>
>  0.7042684217823294
>                 new                                     =>
>  0.6834737905434939
>                 shares                                  =>
>  0.6753327384172428
>                 stock                                   =>
>  0.6576225144041699
> :SV-4785{n=1044 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> 0.01:0.006, 0.02:0.002, 0.02
>         Top Terms:
>                 said                                    =>
>  1.8796076179735086
>                 its                                     =>
> 1.172025965452378
>                 dlrs                                    =>
> 1.130422792460914
>                 pct                                     =>
> 1.082038255241358
>                 mln                                     =>
>  1.0772146872767114
>                 company                                 =>
>  0.9662235879639138
>                 from                                    =>
>  0.9473172871605616
>                 has                                     =>
>  0.9224712965830099
>                 mar                                     =>
>  0.8769325856924421
>                 inc                                     =>
>  0.8360245257169788
>                 shares                                  =>
>  0.8334595641384324
>                 stock                                   =>
>  0.7704621839612175
>                 corp                                    =>
>  0.7682400250301806
>                 which                                   =>
>  0.7389988207856137
>                 would                                   =>
>  0.7339708917389389
>                 year                                    =>
>  0.7088414843731325
>                 new                                     =>
>  0.7038109468655172
>                 he                                      =>
>  0.6993994455501005
>                 u.s                                     =>
>  0.6772649147622415
>                 share                                   =>
>  0.6241804830055171
>
> *lda:*
>
> [snip]
> 21539
> {0.02:0.7071698119320297,0.01:0.09185661419250732,0.055:0.05559716236030052,0:0.04416624482186779,0.046:0.04110903741109505,0.10:0.036664417789754995,0.073:0.009543544335363714,0.006913:0.0050293341354450535,0.007050:0.004031353935388081,0.003:0.0019163134919350053}
> 21540
> {0.04:0.4289840457590362,0.006913:0.3764552842292379,0.03:0.14201700033942147,0.025:0.031492533562460345,0.01:0.0057428959027208,0.0625:0.005563615996288134,0.007050:0.004410121345864598,0.02:0.003315679992182833,0.06:0.0010680133665222197,0.057:2.6908116668663575E-4}
> 21541
> {0:0.6323790020346274,0.02:0.282038438551802,0.06:0.05929917592183956,0.046:0.021922159161082488,0.025:0.003123559949176154,0.1:6.163861804777446E-4,0.057:1.1513607281652563E-4,0.077:7.546641269535658E-5,0.05:6.020069105221075E-5,0.04:5.727546417602751E-5}
> 21542
> {0.06:0.7916202902993545,0.003:0.12474538653424426,0.03:0.0516187501990535,0.073:0.013078206873952192,0.077:0.006874558859394474,0.1:0.004726257215175674,0.07:0.0029954110394770084,0.02:0.0015535221634909381,0.0625:8.057802584933225E-4,0.006913:5.349608183182145E-4}
> 21543
> {0.03:0.8557703918728101,0.007050:0.0884696514495358,0.01:0.02386217742025169,0.055:0.01794750983209314,0.046:0.007244240662980594,0.02:0.0025021610305199344,0.1:0.0011795344808501283,0.0625:9.555718731181631E-4,0.077:8.384080940465597E-4,0.003:2.4480111447900804E-4}
> 21544
> {0.006913:0.6497351267772595,0.03:0.2586364130498913,0.003:0.0845308091084703,0.05:0.003488921831506818,0.0625:0.0011991059579690732,0.06:5.301659535652308E-4,0.055:3.152013670552699E-4,0.025:3.11949418681945E-4,0.02:2.2346509541410544E-4,0.057:1.727741085612565E-4}
> 21545
> {0.07:0.24937001680778675,0.03:0.22854680579732564,0.1:0.22068070811382484,0.05:0.21891358916881656,0.06:0.04971205462818302,0.046:0.029384231252419384,0.025:0.0014118858333346275,0.0625:0.0012321257282372393,0.073:3.1606402052550454E-4,0.04:1.1662985389192366E-4}
> 21546
> {0.006913:0.652435612545713,0.073:0.24282539253890825,0.003:0.055020663101050894,0.046:0.01978650831133704,0.04:0.011699600233998459,0.02:0.006822518565048025,0.05:0.0042733514779345234,0.025:0.0020768247329981497,0.03:0.0015466188524926666,0:0.0010433788491090747}
> 21547
> {0.1:0.935072008706917,0.077:0.040616382672055494,0.046:0.023138054104883225,0.06:3.42252302356326E-4,0.03:2.672247473681925E-4,0.007050:9.004419280457053E-5,0.006913:7.844271097106069E-5,0.04:7.198835322717653E-5,0.073:5.8017696474244885E-5,0.02:4.474316852571109E-5}
> 21548
> {0.006913:0.4510141547325999,0.03:0.21472666934984877,0.07:0.15060570743672352,0.046:0.08216181853028293,0.05:0.07498297963542139,0.077:0.01401215532342401,0.04:0.006135722806477439,0.073:0.0031124469556872442,0.02:0.001327252389650958,0.025:4.5167404646311124E-4}
> 21549
> {0.077:0.5249260290096315,0:0.27520186965742544,0.073:0.08959570186504386,0.057:0.05063809804337512,0.02:0.045313417578343,0.03:0.00856024297303885,0.06:0.0034039136814002697,0.07:8.611323331226122E-4,0.05:3.9387255234958607E-4,0.04:3.454752498776842E-4}
> 21550
> {0.077:0.8319708524327014,0.007050:0.16001430652201168,0.0625:0.0024884528530674226,0.03:0.0024228964094551416,0.006913:0.0010048446080994332,0.025:5.974607168723063E-4,0.06:2.7741593377792194E-4,0.057:2.2975316663437597E-4,0.073:2.1769059479546598E-4,0.01:1.4945234676270913E-4}
> 21551
> {0:0.43298549949426596,0.07:0.30407459462158964,0.0625:0.15744077685124136,0.01:0.06385977925647884,0.025:0.020205557109867888,0.04:0.013226123091680062,0.055:0.006501562964287696,0.073:5.871897881404074E-4,0.003:4.554836644848264E-4,0.05:2.0489216962221778E-4}
> 21552
> {0.06:0.7383744333942458,0.02:0.07321126682102753,0.07:0.05910378841288311,0.10:0.056907223730939045,0:0.02739559786902668,0.055:0.02228913751272657,0.1:0.00943274247398869,0.073:0.007301445750018608,0.03:0.0027711985062277246,0.046:0.0022569760697531112}
> 21553
> {0.006913:0.9383779606743132,0.07:0.018356074893823605,0.1:0.017572502072403694,0.025:0.011031848455352145,0.02:0.007110054905474798,0.10:0.0032390933761145377,0.01:0.0022676909091530165,0.06:6.35712654913703E-4,0.003:3.037608224814005E-4,0.0625:2.569989005512836E-4}
> 21554
> {0.02:0.4740260172915081,0.03:0.3264871353578654,0.003:0.12383071192450323,0:0.0387913487693229,0.046:0.013922363892934853,0.01:0.01115832101415319,0.055:0.004438529958216764,0.006913:0.0031112721201723413,0.025:0.002399946628810436,0.073:6.131643125523258E-4}
> 21555
> {0.01:0.42396888624053325,0.057:0.2876207419392007,0.055:0.13436747721404915,0.10:0.0923627978966975,0.05:0.030530372962547347,0.003:0.008404234144369286,0.006913:0.007605013189108045,0.007050:0.005108825532219537,0.0625:0.004651573366090424,0.025:0.002180181204608079}
> 21556
> {0.02:0.9634685201058687,0.077:0.02816791882994464,0.003:0.007368001305747207,0.01:4.6299664334060455E-4,0.055:6.358811991095171E-5,0.06:5.5240076581247115E-5,0.03:5.231350505975146E-5,0.073:4.4773553407989E-5,0:3.6810457882911134E-5,0.006913:3.469265861451538E-5}
> 21557
> {0.06:0.9993947051352264,0.05:2.53296407549323E-4,0.03:9.647254237540585E-5,0.04:4.3439270433017595E-5,0.006913:3.354681491698619E-5,0.046:2.8586727566416525E-5,0.007050:2.5584606074089293E-5,0.02:2.2477243875316502E-5,0:1.9194783598477564E-5,0.073:1.6900267481892075E-5}
> 21558
> {0.06:0.9985073788555696,0.03:8.033091100049725E-4,0.02:1.1953766739610202E-4,0.1:9.257610652171745E-5,0.006913:6.903783269939522E-5,0.04:5.7960967280609926E-5,0.07:5.5767050956214925E-5,0.046:4.857027337508731E-5,0.007050:3.5964741541364354E-5,0.057:3.538185773175377E-5}
> 21559
> {0.006913:0.5411158302162348,0.073:0.10595054605908563,0.04:0.10013413310674449,0.007050:0.08891690362990352,0.003:0.08028744789933502,0.03:0.07779866329563544,0.025:0.0024314950846438975,0.0625:0.0018111845757907532,0.077:5.019763638644379E-4,0.055:2.6705221435486376E-4}
> 21560
> {0.06:0.9978884964462115,0.03:9.139179655096763E-4,0.02:6.066525825847198E-4,0.04:3.2209695809936266E-4,0.006913:7.970816960964983E-5,0.007050:5.58984012266038E-5,0.046:1.545206468939752E-5,0.077:1.5319891034946036E-5,0:1.5223193391279898E-5,0.073:1.1687759018690983E-5}
> 21561
> {0.06:0.8704787441462888,0.007050:0.09131090501970876,0.0625:0.03778566273654969,0.046:8.111987924033587E-5,0.01:6.397000919080148E-5,0.1:4.979138875277178E-5,0.073:2.7942853854174412E-5,0.03:2.762357950066146E-5,0.077:2.6820742114533983E-5,0.025:2.6208487691114472E-5}
> 21562
> {0.06:0.9625603310705717,0.02:0.01708933716171641,0.007050:0.011374975845817934,0.07:0.00482114999912725,0.057:0.003310531318631415,0.077:2.568457462720504E-4,0.025:1.3439876933758153E-4,0.1:1.0512433283405881E-4,0:8.737443941062146E-5,0.046:5.9284232059258864E-5}
> 21563
> {0.06:0.9996809273982157,0.046:1.6325482120709976E-4,0.02:2.1084598024339765E-5,0.006913:1.7901242019979392E-5,0.04:1.3991162886383784E-5,0.03:1.3682157013015017E-5,0.077:1.3602492456590212E-5,0.007050:1.206712606141859E-5,0.1:1.0231842284246997E-5,0.057:7.704725064510759E-6}
> 21564
> {0.06:0.9997861963976675,0.03:2.7140899197995276E-5,0.077:2.4456158311423386E-5,0.04:1.7952818233846462E-5,0.02:1.6334455815684533E-5,0.006913:1.4910843270220926E-5,0.073:1.4893553937733922E-5,0.1:1.2545291899719683E-5,0.007050:1.2337617858874285E-5,0.07:1.121302251254011E-5}
> 21565
> {0.006913:0.5047832315246878,0.007050:0.2502505818382197,0.04:0.09937533960784072,0.03:0.05332716291468396,0.0625:0.035738976624857435,0.05:0.023139962103851885,0.1:0.01510786357969295,0.025:0.01294816540331917,0.06:0.003589347327961106,0.073:4.8607269023994543E-4}
> 21566
> {0.03:0.865579490292393,0.073:0.065013560785593,0.077:0.056622094108767465,0.046:0.006811842330071251,0.057:0.0021561477140846267,0.01:0.0013149375957061502,0.04:7.456782721333958E-4,0.05:6.412995789267404E-4,0:2.107021879325011E-4,0.007050:1.8121393989190674E-4}
> 21567
> {0.077:0.6528663315309344,0.03:0.2794089480653573,0.025:0.060214953606503134,0.003:0.0029851203917978303,0.01:0.002840784719750811,0.007050:5.508511345707982E-4,0.02:3.3395642786457786E-4,0.07:1.7452781529689483E-4,0.055:1.541344869853217E-4,0.046:1.0154945247629696E-4}
> 21568
> {0.057:0.7925855379581803,0.03:0.2036907633660934,0.06:0.002283924010657722,0.046:0.0011125177332923534,0.10:6.706442563331911E-5,0.02:3.9070227131596934E-5,0.07:3.266094677087569E-5,0.1:2.3436639268605713E-5,0.077:2.318067689954084E-5,0.006913:2.273173553155518E-5}
> 21569
> {0.06:0.7223525910216753,0.1:0.22280373045161775,0.04:0.04239924319412595,0.02:0.006529556700876843,0.007050:0.004661124794787862,0.10:2.017974034648702E-4,0.05:2.017516118028694E-4,0.025:1.583677755896652E-4,0.006913:1.1939663934259253E-4,0:8.736457986006156E-5}
> 21570
> {0.073:0.5321098304788365,0.006913:0.3897385574581158,0.02:0.051125703897749404,0.077:0.012417023254098358,0.01:0.006491518762040415,0.03:0.005434610908750246,0.055:9.352347087701305E-4,0.06:5.239363525283659E-4,0.10:4.4114444135088393E-4,0.003:2.2360302221231853E-4}
> 21571
> {0.06:0.9074255414695478,0.05:0.08971808812931319,0.02:0.0019124104766371694,0.1:2.581234320438502E-4,0.073:1.9041498001195312E-4,0.046:8.348355382104383E-5,0.006913:5.748062878632201E-5,0.057:4.8979196235823963E-5,0.04:4.4005272512340306E-5,0.03:4.1089625609562384E-5}
> 21572
> {0.06:0.7216888928389846,0.04:0.1955684645266304,0.006913:0.08235454447065854,0.1:9.82899034505965E-5,0.046:4.9184577303445956E-5,0.05:4.021965070167039E-5,0.007050:3.081280652634891E-5,0.073:2.791145099471127E-5,0.02:1.95681856192452E-5,0:1.9037190007395713E-5}
> 21573
> {0.05:0.8568971411565196,0.046:0.12909436895238377,0.06:0.012704234652048044,0.04:2.472784666357729E-4,0.055:2.1886698996361582E-4,0.1:1.1974451444757112E-4,0.0625:9.082357988309755E-5,0.07:9.030017229129562E-5,0.03:7.268045692763623E-5,0.073:6.607926928741721E-5}
> 21574
> {0.046:0.5619466458628039,0.006913:0.184782367089353,0.0625:0.09726566772972363,0.003:0.09534816862353344,0.02:0.015692927163565275,0.073:0.015492017672231727,0.01:0.01411218625979968,0.007050:0.01037341031640615,0.055:0.0020124341216292752,0:0.0013828685922332715}
> 21575
> {0.05:0.3167681189235041,0.06:0.2879333280436204,0.046:0.22584628506521745,0.003:0.15994241628395953,0.0625:0.006482994028630967,0.1:0.0016773104050919493,0.055:4.8725015996676173E-4,0.03:2.432893903382962E-4,0.025:1.6015914759364425E-4,0.006913:1.0021292427951807E-4}
> 21576
> {0.077:0.2407816981967022,0.003:0.18594668103110193,0.10:0.14951775492012523,0.0625:0.09241152906714677,0.007050:0.08997645163280943,0.057:0.07102865286733068,0.055:0.048046579920457584,0.05:0.03776387140040494,0.073:0.033355081179026046,0.006913:0.015980584385115525}
> 21577
> {0.06:0.9599074612361259,0.1:0.03694135499501186,0.04:0.0021941315048273186,0.057:4.7070027358666304E-4,0.02:1.8231606308803002E-4,0.0625:6.234942557920162E-5,0.006913:5.159011604129561E-5,0.046:3.9832812943124216E-5,0.007050:3.0380683715134534E-5,0.05:2.8920851352755496E-5}
>
>
> *Streaming k-means:*
>
> [snip]
> INFO: Number of Centroids: 0
> Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
> WARNING: job_local23982482_0001
> java.lang.IllegalArgumentException: Must have nonzero number of training
> and test vectors. Asked for %.1f %% of %d vectors for test
> [10.000000149011612, 0]
>         at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
>         at
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
>         at
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
>         at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
>         at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
>         at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
>         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
>         at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
>
> [snip]
>
> WARNING: No qualcluster.props found on classpath, will use command-line
> arguments only
> Num clusters: 0; maxDistance: 0.000000
> [Dunn Index] First: Infinity
> [Davies-Bouldin Index] First: NaN
> Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 535 ms (Minutes: 0.008916666666666666)
> cluster,distance.mean,distance.sd
> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>
>
> On Tue, Jan 21, 2014 at 1:47 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
>> *classify-20newsgroups.sh*
>>
>> *Complementary naive bayes:*
>> =======================================================
>> Summary
>> -------------------------------------------------------
>> Correctly Classified Instances          :      11207       98.9406%
>> Incorrectly Classified Instances        :        120        1.0594%
>> Total Classified Instances              :      11327
>>
>> =======================================================
>> Confusion Matrix
>> -------------------------------------------------------
>> a       b       c       d       e       f       g       h       i       j
>>       k       l       m       n       o       p       q       r       s
>>  t        <--Classified as
>> 475     0       0       1       0       0       0       0       0       0
>>       0       0       0       0       1       0       1       0       0
>>  0         |  478         a     = alt.atheism
>> 0       597     1       1       0       1       1       0       0       0
>>       0       1       0       2       1       0       0       0       0
>>  0         |  605         b     = comp.graphics
>> 0       1       620     3       0       1       0       0       0       0
>>       0       1       0       0       1       0       0       0       0
>>  0         |  627         c     = comp.os.ms-windows.misc
>> 1       1       1       593     2       0       0       0       0       0
>>       0       0       0       0       0       1       0       0       0
>>  0         |  599         d     = comp.sys.ibm.pc.hardware
>> 0       1       1       0       568     0       1       0       0       0
>>       1       1       2       0       0       0       0       1       0
>>  0         |  576         e     = comp.sys.mac.hardware
>> 0       4       2       0       0       581     0       0       0       0
>>       0       0       0       0       0       0       0       0       0
>>  0         |  587         f     = comp.windows.x
>> 0       0       0       1       2       0       571     3       0       0
>>       1       1       4       1       0       0       0       0       0
>>  0         |  584         g     = misc.forsale
>> 0       0       0       1       0       0       0       589     1       0
>>       0       1       1       0       0       0       0       0       0
>>  0         |  593         h     = rec.autos
>> 0       0       0       0       0       0       0       1       565     0
>>       0       0       0       0       1       0       0       0       0
>>  0         |  567         i     = rec.motorcycles
>> 0       0       0       0       0       0       0       0       0
>> 600     2       0       0       0       1       0       0       0       0
>>    0         |  603         j     = rec.sport.baseball
>> 0       0       0       0       0       0       0       0       0       1
>>       584     0       0       0       0       0       0       0       0
>>  0         |  585         k     = rec.sport.hockey
>> 0       0       0       0       0       0       0       0       0       0
>>       0       579     0       0       0       0       0       1       0
>>  0         |  580         l     = sci.crypt
>> 0       0       0       1       3       0       2       0       0       2
>>       0       0       567     1       2       1       0       0       0
>>  0         |  579         m     = sci.electronics
>> 0       0       0       0       0       0       0       0       0       0
>>       0       0       1       605     0       0       0       0       0
>>  0         |  606         n     = sci.med
>> 0       0       0       0       0       0       0       0       0       0
>>       0       0       0       0       602     0       0       0       0
>>  0         |  602         o     = sci.space
>> 0       0       0       0       0       0       0       0       0       0
>>       0       0       0       1       0       602     0       0       1
>>  0         |  604         p     = soc.religion.christian
>> 0       0       0       0       0       0       0       0       0       0
>>       0       0       0       0       0       0       556     0       0
>>  0         |  556         q     = talk.politics.mideast
>> 0       0       1       0       0       0       0       0       0       0
>>       0       1       0       0       1       0       0       568     0
>>  0         |  571         r     = talk.politics.guns
>> 11      0       0       0       0       0       0       0       0       1
>>       0       0       0       1       3       8       1       4       338
>>  2         |  369         s     = talk.religion.misc
>> 0       0       0       0       0       0       0       0       0       0
>>       1       0       0       0       1       0       3       4       0
>>  447       |  456         t     = talk.politics.misc
>>
>> =======================================================
>> Statistics
>> -------------------------------------------------------
>> Kappa                                       0.9806
>> Accuracy                                   98.9406%
>> Reliability                                94.0932%
>> Reliability (standard deviation)            0.2163
>>
>> Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Program took 15870 ms (Minutes: 0.2645)
>> + echo 'Testing on holdout set'
>> Testing on holdout set
>> + ./bin/mahout testnb -i /tmp/mahout-work-ec2-user/20news-test-vectors -m
>> /tmp/mahout-work-ec2-user/model -l /tmp/mahout-work-ec2-user/labelindex -ow
>> -o /tmp/mahout-work-ec2-user/20news-testing -c
>>
>> [snip]
>>
>> INFO: Complementary Results:
>> =======================================================
>> Summary
>> -------------------------------------------------------
>> Correctly Classified Instances          :       6715       89.3071%
>> Incorrectly Classified Instances        :        804       10.6929%
>> Total Classified Instances              :       7519
>>
>> =======================================================
>> Confusion Matrix
>> -------------------------------------------------------
>> a       b       c       d       e       f       g       h       i       j
>>       k       l       m       n       o       p       q       r       s
>>  t        <--Classified as
>> 298     0       0       0       0       0       0       0       0       1
>>       0       0       0       1       2       5       1       0       13
>>   0         |  321         a     = alt.atheism
>> 0       298     11      6       1       12      2       2       1       1
>>       3       8       3       4       2       4       1       4       4
>>  1         |  368         b     = comp.graphics
>> 1       17      286     16      4       9       6       3       2       0
>>       1       0       1       7       1       0       2       1       0
>>  1         |  358         c     = comp.os.ms-windows.misc
>> 2       6       11      309     9       5       14      8       1       0
>>       2       0       6       4       2       0       1       2       1
>>  0         |  383         d     = comp.sys.ibm.pc.hardware
>> 0       10      8       7       334     7       5       5       2       0
>>       3       0       2       1       1       0       1       1       0
>>  0         |  387         e     = comp.sys.mac.hardware
>> 1       13      7       8       2       355     2       0       2       0
>>       0       5       1       1       3       0       0       1       0
>>  0         |  401         f     = comp.windows.x
>> 0       7       11      29      12      9       268     16      8       4
>>       3       2       6       4       2       1       3       1       2
>>  3         |  391         g     = misc.forsale
>> 0       1       0       0       3       0       7       362     8       2
>>       2       1       2       0       2       0       1       2       0
>>  4         |  397         h     = rec.autos
>> 0       0       0       1       0       0       1       0       423     0
>>       0       0       2       1       0       1       0       0       0
>>  0         |  429         i     = rec.motorcycles
>> 0       0       1       0       0       0       0       2       2
>> 371     8       0       2       3       0       2       0       0       0
>>    0         |  391         j     = rec.sport.baseball
>> 0       0       1       0       0       0       1       0       0       2
>>       409     0       0       0       0       0       0       0       0
>>  1         |  414         k     = rec.sport.hockey
>> 0       0       1       2       1       0       1       0       0       0
>>       0       404     0       0       0       0       0       1       0
>>  1         |  411         l     = sci.crypt
>> 0       5       4       11      1       3       7       9       2       5
>>       3       3       339     2       6       0       1       1       2
>>  1         |  405         m     = sci.electronics
>> 0       4       0       1       0       0       0       1       0       1
>>       1       0       3       367     3       1       2       0       0
>>  0         |  384         n     = sci.med
>> 0       1       2       0       1       0       2       0       0       1
>>       0       0       1       1       375     0       1       0       0
>>  0         |  385         o     = sci.space
>> 4       2       1       1       0       0       1       1       2       0
>>       0       1       1       5       1       367     4       0       1
>>  1         |  393         p     = soc.religion.christian
>> 0       1       0       0       0       0       0       0       0       2
>>       0       0       0       0       0       2       378     0       1
>>  0         |  384         q     = talk.politics.mideast
>> 0       0       0       0       0       2       1       1       1       1
>>       0       3       0       3       0       0       2       319     2
>>  4         |  339         r     = talk.politics.guns
>> 32      0       0       1       0       0       0       0       0       1
>>       1       1       0       2       2       26      5       7       175
>>  6         |  259         s     = talk.religion.misc
>> 0       0       0       2       0       0       0       0       0       1
>>       2       2       0       1       2       1       10      18      2
>>  278       |  319         t     = talk.politics.misc
>>
>> =======================================================
>> Statistics
>> -------------------------------------------------------
>> Kappa                                       0.8594
>> Accuracy                                   89.3071%
>> Reliability                                 84.611%
>> Reliability (standard deviation)            0.2148
>>
>> Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Program took 10840 ms (Minutes: 0.18066666666666667)
>>
>>
>> *Naive bayes:*
>> INFO: Standard NB Results:
>> =======================================================
>> Summary
>> -------------------------------------------------------
>> Correctly Classified Instances          :      11286       99.0869%
>> Incorrectly Classified Instances        :        104        0.9131%
>> Total Classified Instances              :      11390
>>
>> =======================================================
>> Confusion Matrix
>> -------------------------------------------------------
>> a       b       c       d       e       f       g       h       i       j
>>       k       l       m       n       o       p       q       r       s
>>  t        <--Classified as
>> 474     0       0       0       0       0       0       0       0       0
>>       0       0       0       0       0       0       0       0       2
>>  1         |  477         a     = alt.atheism
>> 0       566     0       2       0       1       0       0       0       0
>>       0       0       0       0       0       0       0       0       0
>>  0         |  569         b     = comp.graphics
>> 0       10      590     29      2       4       1       0       0       0
>>       0       0       1       0       0       0       0       0       0
>>  1         |  638         c     = comp.os.ms-windows.misc
>> 0       0       0       596     0       0       0       0       0       0
>>       0       0       0       0       0       0       0       0       0
>>  0         |  596         d     = comp.sys.ibm.pc.hardware
>> 0       0       0       0       575     0       1       0       0       0
>>       0       0       1       0       0       0       0       0       0
>>  0         |  577         e     = comp.sys.mac.hardware
>> 0       2       2       2       0       593     1       0       0       0
>>       0       0       0       0       1       0       0       0       0
>>  0         |  601         f     = comp.windows.x
>> 0       0       0       1       0       0       589     1       0       0
>>       1       0       2       0       0       0       0       0       0
>>  0         |  594         g     = misc.forsale
>> 0       0       0       0       0       0       0       594     0       0
>>       0       0       0       0       0       0       0       0       0
>>  0         |  594         h     = rec.autos
>> 0       0       0       0       0       0       0       0       611     0
>>       0       0       0       0       0       0       0       0       0
>>  0         |  611         i     = rec.motorcycles
>> 0       0       0       0       0       0       0       0       0
>> 616     1       0       0       0       0       0       0       0       0
>>    0         |  617         j     = rec.sport.baseball
>> 0       0       0       0       0       0       1       0       0       0
>>       620     0       0       0       0       0       0       0       0
>>  0         |  621         k     = rec.sport.hockey
>> 0       0       0       0       0       0       0       0       0       0
>>       0       580     0       0       0       0       0       1       0
>>  0         |  581         l     = sci.crypt
>> 0       0       0       3       1       0       0       0       0       0
>>       0       0       571     0       0       0       0       0       0
>>  0         |  575         m     = sci.electronics
>> 0       0       0       0       0       0       0       0       0       0
>>       0       0       2       583     0       0       0       0       0
>>  0         |  585         n     = sci.med
>> 0       0       0       0       0       0       0       0       0       0
>>       0       0       0       1       599     0       0       0       0
>>  0         |  600         o     = sci.space
>> 0       1       0       0       0       0       0       0       0       0
>>       0       0       0       0       0       615     0       0       0
>>  0         |  616         p     = soc.religion.christian
>> 1       0       0       0       0       0       0       0       0       0
>>       0       0       0       0       0       1       560     0       0
>>  0         |  562         q     = talk.politics.mideast
>> 0       0       1       0       0       0       0       0       0       0
>>       0       1       0       0       0       0       0       548     0
>>  1         |  551         r     = talk.politics.guns
>> 10      0       0       0       0       0       0       0       0       0
>>       0       0       0       0       1       1       0       2       344
>>  1         |  359         s     = talk.religion.misc
>> 0       0       0       0       0       0       0       0       0       0
>>       0       1       1       0       0       0       0       2       0
>>  462       |  466         t     = talk.politics.misc
>>
>> =======================================================
>> Statistics
>> -------------------------------------------------------
>> Kappa                                       0.9847
>>  Accuracy                                   99.0869%
>> Reliability                                94.3334%
>> Reliability (standard deviation)            0.2169
>>
>> Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Program took 14304 ms (Minutes: 0.2384)
>> + echo 'Testing on holdout set'
>> Testing on holdout set
>>
>> [snip]
>>
>> INFO: Standard NB Results:
>> =======================================================
>> Summary
>> -------------------------------------------------------
>> Correctly Classified Instances          :       6718       90.1019%
>> Incorrectly Classified Instances        :        738        9.8981%
>> Total Classified Instances              :       7456
>>
>> =======================================================
>> Confusion Matrix
>> -------------------------------------------------------
>> a       b       c       d       e       f       g       h       i       j
>>       k       l       m       n       o       p       q       r       s
>>  t        <--Classified as
>> 294     0       0       0       0       0       0       0       0       0
>>       0       2       0       1       1       6       1       1       16
>>   0         |  322         a     = alt.atheism
>> 0       345     6       14      6       11      6       0       0       0
>>       0       5       7       1       3       0       0       0       0
>>  0         |  404         b     = comp.graphics
>> 2       29      177     78      22      19      9       1       0       0
>>       0       4       2       0       1       1       0       0       1
>>  1         |  347         c     = comp.os.ms-windows.misc
>> 1       9       2       335     18      2       10      0       0       0
>>       1       0       8       0       0       0       0       0       0
>>  0         |  386         d     = comp.sys.ibm.pc.hardware
>> 1       4       2       13      347     3       5       1       0       0
>>       1       0       7       1       0       0       0       1       0
>>  0         |  386         e     = comp.sys.mac.hardware
>> 0       20      0       4       0       352     4       0       0       0
>>       0       0       1       1       3       0       1       0       1
>>  0         |  387         f     = comp.windows.x
>> 0       2       0       21      5       1       323     7       2       2
>>       0       2       12      0       3       0       0       0       0
>>  1         |  381         g     = misc.forsale
>> 0       1       0       0       1       0       15      363     8       1
>>       0       0       4       1       0       0       0       1       0
>>  1         |  396         h     = rec.autos
>> 0       1       0       0       0       0       6       6       370     0
>>       0       0       0       1       0       0       0       0       1
>>  0         |  385         i     = rec.motorcycles
>> 1       0       0       1       1       0       2       1       2
>> 362     5       0       2       0       0       0       0       0       0
>>    0         |  377         j     = rec.sport.baseball
>> 0       0       0       1       2       0       0       0       0       3
>>       371     0       0       0       0       0       0       0       0
>>  1         |  378         k     = rec.sport.hockey
>> 0       3       1       0       1       0       2       0       0       0
>>       0       396     0       1       0       0       1       1       1
>>  3         |  410         l     = sci.crypt
>> 0       7       0       7       7       2       6       4       0       0
>>       0       1       369     2       2       0       0       0       0
>>  2         |  409         m     = sci.electronics
>> 0       3       0       2       1       0       2       0       0       0
>>       0       1       4       383     4       0       0       1       0
>>  4         |  405         n     = sci.med
>> 0       5       0       0       1       0       3       0       0       0
>>       0       0       1       0       374     1       0       0       1
>>  1         |  387         o     = sci.space
>> 6       2       0       1       1       0       0       1       0       1
>>       0       0       1       5       0       352     2       1       7
>>  1         |  381         p     = soc.religion.christian
>> 1       1       0       0       0       0       0       0       0       0
>>       1       0       0       0       0       0       373     1       0
>>  1         |  378         q     = talk.politics.mideast
>> 0       0       0       0       0       0       1       0       1       0
>>       0       2       0       0       0       0       0       346     2
>>  7         |  359         r     = talk.politics.guns
>> 26      1       0       1       0       0       0       2       0       1
>>       1       0       0       1       1       20      2       6       200
>>  7         |  269         s     = talk.religion.misc
>> 1       0       0       0       0       0       0       2       0       0
>>       1       0       0       2       2       0       1       14      0
>>  286       |  309         t     = talk.politics.misc
>>
>> =======================================================
>> Statistics
>> -------------------------------------------------------
>> Kappa                                       0.8726
>> Accuracy                                   90.1019%
>> Reliability                                85.4491%
>> Reliability (standard deviation)            0.2222
>>
>> Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Program took 10878 ms (Minutes: 0.1813)
>>
>> *SGD:*
>> 7532 test files
>>
>> =======================================================
>> Summary
>> -------------------------------------------------------
>> Correctly Classified Instances          :       5649            75%
>> Incorrectly Classified Instances        :       1883            25%
>> Total Classified Instances              :       7532
>>
>> =======================================================
>> Confusion Matrix
>> -------------------------------------------------------
>> a       b       c       d       e       f       g       h       i       j
>>       k       l       m       n       o       p       q       r       s
>>  t        <--Classified as
>> 186     6       3       10      5       0       33      4       13
>>  15      7       1       24      15      3       15      5       5       29
>>     15        |  394         a     = sci.space
>> 5       309     0       3       2       5       0       0       0       1
>>       9       21      2       0       0       18      4       4       1
>>  1         |  385         b     = comp.sys.mac.hardware
>> 4       1       101     3       0       1       63      0       7       0
>>       1       1       5       16      3       0       3       7       1
>>  34        |  251         c     = talk.religion.misc
>> 11      12      1       265     1       10      3       0       0
>> 17      10      11      5       2       0       11      3       6       21
>>     0         |  389         d     = comp.graphics
>> 2       1       1       0       349     2       3       0       3       2
>>       6       1       5       1       0       2       15      2       1
>>  2         |  398         e     = rec.motorcycles
>> 7       20      3       19      2       254     6       0       2
>> 11      2       39      7       2       0       4       2       2       9
>>    3         |  394         f     = comp.os.ms-windows.misc
>> 2       1       13      0       0       0       247     0       1       1
>>       3       0       6       2       4       0       2       3       5
>>  29        |  319         g     = alt.atheism
>> 1       1       0       0       2       0       2       361     0       1
>>       2       0       2       0       0       1       3       22      0
>>  1         |  399         h     = rec.sport.hockey
>> 3       0       3       1       0       0       5       0       161     0
>>       1       2       12      102     0       0       1       2       11
>>   6         |  310         i     = talk.politics.misc
>> 2       8       0       19      0       19      0       0       1
>> 294     10      11      4       2       0       5       0       3       11
>>     6         |  395         j     = comp.windows.x
>> 2       10      0       1       1       0       0       0       0       1
>>       347     13      2       1       0       5       3       2       2
>>  0         |  390         k     = misc.forsale
>> 1       36      0       6       1       25      0       0       1       6
>>       10      257     2       1       0       34      6       0       6
>>  0         |  392         l     = comp.sys.ibm.pc.hardware
>> 2       2       2       2       1       0       12      0       0       6
>>       10      4       312     5       2       13      11      3       3
>>  6         |  396         m     = sci.med
>> 2       0       3       2       1       0       0       1       13      0
>>       5       1       2       314     2       0       2       2       10
>>   4         |  364         n     = talk.politics.guns
>> 1       0       2       1       1       0       34      1       33      1
>>       3       0       1       8       271     1       4       5       6
>>  3         |  376         o     = talk.politics.mideast
>> 3       14      0       8       2       8       3       1       1       7
>>       12      29      6       2       1       245     13      2       32
>>   4         |  393         p     = sci.electronics
>> 3       3       0       2       11      0       1       0       2       1
>>       11      6       4       2       0       11      330     4       4
>>  1         |  396         q     = rec.autos
>> 0       0       1       0       1       0       4       12      3       1
>>       3       0       0       0       0       5       6       359     1
>>  1         |  397         r     = rec.sport.baseball
>> 0       1       0       0       0       1       0       0       3       3
>>       0       0       3       2       1       6       1       6       366
>>  3         |  396         s     = sci.crypt
>> 0       2       11      1       1       0       40      0       1       2
>>       3       4       2       1       0       5       0       2       2
>>  321       |  398         t     = soc.religion.christian
>>
>> =======================================================
>> Statistics
>> -------------------------------------------------------
>> Kappa                                       0.7073
>> Accuracy                                        75%
>> Reliability                                70.6238%
>> Reliability (standard deviation)            0.2187
>> Log-likelihood                mean      :    -1.1182
>>                               25%-ile   :    -1.6911
>>                               75%-ile   :    -0.0803
>>
>> Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Program took 10783 ms (Minutes: 0.17971666666666666)
>>
>>
>>
>>
>> On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <su...@yahoo.com>wrote:
>>
>>> Thanks Andrew for reporting that. I rolled back the release to fix this
>>> and few other issues.
>>>
>>> We have removed asf-examples*.sh from trunk as the sample file at the
>>> url mentioned in ur email is not available.
>>> This is something we need to fix and restore in 1.0.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <ap...@outlook.com>
>>> wrote:
>>>
>>> from the asf-email-examples.sh script:
>>>
>>> # You will need to download or otherwise obtain some or all of the
>>> Amazon ASF Em
>>> ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566) to
>>> use this
>>> script.
>>> # To obtain a full copy you will need to launch an EC2 instance and
>>> mount the da
>>> taset to download it, otherwise you can get a sample of it at
>>> #
>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>>>
>>> It looks like the:
>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>>>
>>> link is down.
>>>
>>> Is there somewhere else that we can get a subset of the ASF emails?
>>>
>>>
>>>
>>> Date: Tue, 21 Jan 2014 09:48:06 -0800
>>> > Subject: Re: MAHOUT 0.9 Release - New URL
>>> > From: andrew.musselman@gmail.com
>>> > To: dev@mahout.apache.org
>>> >
>>> > Sure thing; continuing to smoke test the other examples tonight
>>> >
>>> >
>>> > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <
>>> suneel_marthi@yahoo.com>wrote:
>>> >
>>> > > Thanks Andrew M., see that some of the example scripts need to be
>>> fixed as
>>> > > they still refer to the deprecated algorithms.
>>> > > See that the Streaming KMeans has failed for you as well.
>>> > >
>>> > > I'll be rolling back the release today to fix these issues.
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
>>> > > andrew.musselman@gmail.com> wrote:
>>> > >
>>> > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default
>>> 64-bit
>>> > > Linux AMI from tarball.
>>> > >
>>> > > All tests pass.
>>> > >
>>> > > *Output of examples:*
>>> > > *asf-email-examples.sh, run on mahout.apache.org
>>> > > <http://mahout.apache.org>:*
>>> > > *recommendations:*
>>> > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
>>> > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000  | less
>>> > > 1
>>> > >
>>> > >
>>> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
>>> > > 4
>>> > >
>>> > >
>>> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
>>> > > 6
>>> > >
>>> > >
>>> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
>>> > > 8
>>> > >     [12758:1.0,19409:1.0,11112:1.0]
>>> > > 11
>>> > >
>>> > >
>>> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
>>> > > 14
>>> > >
>>> > >
>>> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
>>> > > 15
>>> > >
>>> > >
>>> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
>>> > > 16
>>> > >
>>> > >
>>> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
>>> > > 18
>>> > >
>>> > >
>>> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
>>> > > 19      [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
>>> > > 20
>>> > >
>>> > >
>>> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
>>> > > [snip]
>>> > >
>>> > > *clustering; kmeans:*
>>> > > [snip]
>>> > >         Weight : [props - optional]:  Point:
>>> > >         1.0 :
>>> > >  [distance-squared=1.0193102046188427]:
>>> > > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus =
>>> > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
>>> 7573:0.204,
>>> > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093,
>>> 9779:0.159,
>>> > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
>>> > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
>>> > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
>>> > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
>>> > > 39789:0.110, 40743:0.190, 45775:0.086]
>>> > >         1.0 : [distance-squared=0.9823018320457279]:
>>> > > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus=
>>> > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
>>> 5336:0.106,
>>> > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173,
>>> 7832:0.072,
>>> > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
>>> > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
>>> > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
>>> > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
>>> > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
>>> > >         1.0 : [distance-squared=0.9509142993214911]:
>>> > > /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor=
>>> > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
>>> > >  4419:0.076,
>>> > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056,
>>> 7235:0.048,
>>> > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123,
>>> 7683:0.077,
>>> > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
>>> 10225:0.081,
>>> > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
>>> > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
>>> > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
>>> > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
>>> > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
>>> > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
>>> > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
>>> > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
>>> > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
>>> > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
>>> > > 41280:0.065, 41696:0.072, 41947:0.118,
>>> > >  43685:0.086, 44077:0.308,
>>> > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
>>> > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
>>> > > [snip]
>>> > >
>>> > > *clustering; dirichlet:*
>>> > > Get this complaint:
>>> > > Running Dirichlet with K = 8
>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
>>> > > HADOOP_CONF_DIR=
>>> > > MAHOUT-JOB:
>>> > >
>>> > >
>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class:
>>> dirichlet
>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props found
>>> on
>>> > > classpath, will use command-line arguments only
>>> > > Unknown program 'dirichlet' chosen.
>>> > >
>>> > > *clustering: minhash:*
>>> > > Running Minhash
>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
>>> > > HADOOP_CONF_DIR=
>>> > > MAHOUT-JOB:
>>> > >
>>> > >
>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>>> > > 14/01/21 05:17:27 WARN
>>> > >  driver.MahoutDriver: Unable to add class: minhash
>>> > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found on
>>> > > classpath, will use command-line arguments only
>>> > > Unknown program 'minhash' chosen.
>>> > >
>>> > > *classification; standard:*
>>> > > =======================================================
>>> > > Summary
>>> > > -------------------------------------------------------
>>> > > Correctly Classified Instances          :       5384       87.7874%
>>> > > Incorrectly Classified Instances        :        749       12.2126%
>>> > > Total Classified Instances              :       6133
>>> > >
>>> > > =======================================================
>>> > > Confusion Matrix
>>> > > -------------------------------------------------------
>>> > > a       b       c       d
>>> > >     <--Classified as
>>> > > 2949    7       531     25       |  3512        a     = dev
>>> > > 0       0       0       0        |  0           b     = general
>>> > > 99      8       1763    8        |  1878        c     = user
>>> > > 41      1       29      672      |  743         d     = commits
>>> > >
>>> > > =======================================================
>>> > > Statistics
>>> > > -------------------------------------------------------
>>> > > Kappa
>>> > >  0.7877
>>> > > Accuracy                                   87.7874%
>>> > > Reliability                                 53.658%
>>> > > Reliability (standard deviation)            0.4911
>>> > >
>>> > > *classification; complementary:*
>>> > > =======================================================
>>> > > Summary
>>> > > -------------------------------------------------------
>>> > > Correctly Classified Instances          :       5530       90.1679%
>>> > > Incorrectly Classified Instances        :        603        9.8321%
>>> > > Total Classified Instances              :
>>> > >  6133
>>> > >
>>> > > =======================================================
>>> > > Confusion Matrix
>>> > > -------------------------------------------------------
>>> > > a       b       c       d       <--Classified as
>>> > > 3168    0       276     68       |  3512        a     = dev
>>> > > 0       0       0       0        |  0           b     = general
>>> > > 196     0       1652    30       |  1878        c     = user
>>> > > 25      0       8       710      |  743         d     =
>>> > >  commits
>>> > >
>>> > > =======================================================
>>> > > Statistics
>>> > > -------------------------------------------------------
>>> > > Kappa                                       0.8259
>>> > > Accuracy                                   90.1679%
>>> > > Reliability                                54.7459%
>>> > > Reliability (standard deviation)            0.5005
>>> > >
>>> > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms
>>> (Minutes:
>>> > > 0.34836666666666666)
>>> > >
>>> > > *classification; sgd, with three categories:*
>>> > > Running SGD Training
>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
>>> > >  and
>>> > > HADOOP_CONF_DIR=
>>> > > MAHOUT-JOB:
>>> > >
>>> > >
>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>>> > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
>>> > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
>>> classpath,
>>> > > will use command-line arguments only
>>> > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
>>> > > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
>>> > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
>>> > > --output=[asf-output/classification/sgd/models], --poolSize=[5],
>>> > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
>>> > > 24168 training files
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       1
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>>> > >  2
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       3
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       4
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       6
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       8
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       10
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00
>>> > >    0.00    0.00    0.0000000       0.0000000       12
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       15
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       20
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       25
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       30
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000
>>> > >     0.0000000       40
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       50
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       60
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       70
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       80
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       100
>>> > > 0.000
>>> > >  0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       120
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       140
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       150
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       200
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       250
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00
>>> > >  0.00    0.00    0.0000000       0.0000000       300
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       400
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       500
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       600
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       700
>>> > > 0.000   0.00    none
>>> > > 0.00    0.00    0.00    0.00    0.0000000
>>> > >  0.0000000       800
>>> > > 0.000   0.00    none
>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
>>> > > 1.0019413e-08   1000    -0.607  75.78   none
>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
>>> > > 1.0019413e-08   1200    -0.607  75.78   none
>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
>>> > > 1.0019413e-08   1400    -0.607  75.78   none
>>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
>>> > > 1.0019413e-08   1500    -0.607  75.78   none
>>> > > 0.24    43686.00        17924.00        329.50
>>> > >  1.0571799e-08
>>> > > 1.0032261e-08   2000    -0.487  82.65   none
>>> > > 0.24    49753.00        21610.00        330.71  1.3770070e-08
>>> > > 1.0011902e-08   2500    -0.439  83.90   none
>>> > > 0.24    49753.00        21610.00        330.71  1.3770070e-08
>>> > > 1.0011902e-08   3000    -0.439  83.90   none
>>> > > 0.32    50635.00        28531.00        437.09  1.0551175e-08
>>> > > 1.0000001e-08   4000    -0.351  88.14   none
>>> > > 0.32    50635.00        32642.00        437.09  1.0551175e-08
>>> > > 1.0000000e-08   5000    -0.378  87.10   none
>>> > > 0.32    50635.00        36461.00        437.09
>>> > >  1.0556652e-08
>>> > > 1.0000001e-08   6000    -0.372  86.89   none
>>> > > 0.32    50635.00        37768.00        437.09  1.0576742e-08
>>> > > 1.0000001e-08   7000    -0.334  89.26   none
>>> > > 0.32    50635.00        38807.00        437.09  1.0576742e-08
>>> > > 1.0000000e-08   8000    -0.368  87.52   none
>>> > > 0.32    50635.00        44731.00        437.09  1.0576716e-08
>>> > > 1.0000000e-08   10000   -0.374  87.39   none
>>> > > 0.32    50635.00        45672.00        437.09  1.0576716e-08
>>> > > 1.0000000e-08   12000   -0.298  88.26   none
>>> > > Exception in thread "main" java.lang.IllegalStateException:
>>> > > java.lang.ArrayIndexOutOfBoundsException:
>>> > >  2
>>> > >         at
>>> > >
>>> > >
>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
>>> > >         at
>>> > >
>>> > >
>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
>>> > >         at
>>> > >
>>> > >
>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
>>> > >         at
>>> > >
>>> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
>>> > >         at
>>> > >
>>> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
>>> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>> Method)
>>> > >         at
>>> > >
>>> > >
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> > >
>>> > >  at
>>> > >
>>> > >
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> > >         at java.lang.reflect.Method.invoke(Method.java:622)
>>> > >         at
>>> > >
>>> > >
>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>> > >         at
>>> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>> > >         at
>>> > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>>> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>> Method)
>>> > >         at
>>> > >
>>> > >
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> > >         at
>>> > >
>>> > >
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> > >         at java.lang.reflect.Method.invoke(Method.java:622)
>>> > >         at
>>> > >  org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>>> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
>>> > >         at
>>> > > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
>>> > >         at
>>> > >
>>> > >
>>> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
>>> > >         at
>>> > >
>>> > >
>>> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
>>> > >         at
>>> > >
>>> > >
>>> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
>>> > >         at
>>> > >
>>> > >
>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
>>> > >         at
>>> > >
>>> > >
>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
>>> > >
>>> > >  at
>>> > >
>>> > >
>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
>>> > >         at
>>> > >
>>> > >
>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
>>> > >         at
>>> > >
>>> > >
>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
>>> > >         at
>>> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>> > >         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>> > >         at
>>> > >
>>> > >
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>>> > >         at
>>> > >
>>> > >
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> > >         at java.lang.Thread.run(Thread.java:701)
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
>>> > > andrew.musselman@gmail.com> wrote:
>>> > >
>>> > > > Trying out the build today
>>> > > >
>>> > > >
>>> > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
>>> suneel_marthi@yahoo.com
>>> > > >wrote:
>>> > > >
>>> > > >> This is an issue (trivial one though) that needs to be fixed for
>>> 0.9
>>> > > >> Release, will be rerolling the release today (in the next few
>>> hrs) and
>>> > > >> putting out a new release candidate in staging.
>>> > > >>
>>> > > >> Thanks for reporting this Andrew P.
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
>>> > > ap.dev@outlook.com>
>>> > > >> wrote:
>>> > > >>
>>> > > >> I ran through the tests with on a CentOS VM
>>> > >  AMD64 2 cores 4 GB RAM.  Had
>>> > > >> a bit of trouble getting the Hadoop natives to compile and
>>> therefore may
>>> > > >> have run into some problems because of the hadoop setup.  Ran
>>> into some
>>> > > >> problems in the example scripts.  Particularly with
>>> > > >> ./cluster-syntheticcontrol.sh ->4,5.  I will run through the rest
>>> of the
>>> > > >> examples when im sure I've got hadoop setup right.
>>> > > >>
>>> > > >>
>>> > > >> Apache Maven 3.1.2-SNAPSHOT
>>> > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
>>> > > >> Java home: /usr/java/jdk1.6.0_45/jre
>>> > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch:
>>> "amd64",
>>> > > >> family: "unix"
>>> > > >> $MAHOUT_LOCAL=true
>>> > > >> Hadoop 2.2.0
>>> > > >>
>>> > > >>
>>> > > >> a) Verify that u can unpack the release (tar or zip) ...passed
>>> (tar)
>>> > > >> [passed ]
>>> > > >>
>>> > > >> b) Verify u r able to compile the
>>> > >  distro
>>> > > >>
>>> > > >>     mvn compile- [passed with warnings]
>>> > > >>
>>> > > >>     [WARNING]  Expected all dependencies to require Scala
>>> version: 2.9.3
>>> > > >>     [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires
>>> scala
>>> > > >> version: 2.9.3
>>> > > >>     [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala
>>> > > >> version: 2.9.2
>>> > > >>     [WARNING] Multiple versions of scala libraries detected!
>>> > > >>
>>> > > >> c)  Run through the unit tests: mvn clean test
>>> > > >>     mvn clean test [passed]
>>> > > >>
>>> > > >> d) Run the
>>> > > >>  example scripts under $MAHOUT_HOME/examples/bin.
>>> > > >> Please run through all the different options in each script
>>> > > >>
>>> > > >>     Running example scripts with $MAHOUT_LOCAL=true
>>> > > >>
>>> > > >>
>>> > >  ./cluster-syntheticcontrol.sh ->1 [works]
>>> > > >>     ./cluster-syntheticcontrol.sh ->2 [works]
>>> > > >>     ./cluster-syntheticcontrol.sh ->3 [works]
>>> > > >>
>>> > > >>
>>> > > >>     ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
>>> > > >>     [...]
>>> > > >>     WARNING: Unable to add class:
>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>>> > > >>     java.lang.ClassNotFoundException:
>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>>> > > >>         at
>>> > > >>  java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>> > > >>         at java.security.AccessController.doPrivileged(Native
>>> Method)
>>> > > >>         at
>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>> > > >>         at
>>> > >  java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>> > > >>         at
>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>> > > >>         at java.lang.Class.forName0(Native Method)
>>> > > >>         at java.lang.Class.forName(Class.java:171)
>>> > > >>         at
>>> > > >>
>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>>> > > >>         at
>>> > > >>  org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>>> > > >>     Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
>>> > > >>
>>> > > >>
>>> > > >>     ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
>>> > > >>
>>> > > >>     WARNING: Unable to add class:
>>> > > >>
>>> > >  org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>>> > > >>     java.lang.ClassNotFoundException:
>>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>>> > > >>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>> > > >>         at java.security.AccessController.doPrivileged(Native
>>> Method)
>>> > > >>         at
>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>> > > >>         at
>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>> > > >>         at java.lang.Class.forName0(Native Method)
>>> > > >>         at
>>> > >  java.lang.Class.forName(Class.java:171)
>>> > > >>         at
>>> > > >>
>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>>> > > >>         at
>>> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>>> > > >>     Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
>>> > > >>     WARNING: No
>>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props
>>> found
>>> > > on
>>> > > >> classpath, will use command-line arguments only
>>> > > >>     Unknown program
>>> > > >>  'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
>>> chosen.
>>> > > >>
>>> > > >>
>>> > > >>     ./classify-20newsgroups.sh ->1 [works]
>>> > > >>     ./classify-20newsgroups.sh ->2 [works]
>>> > > >>
>>> > > >>
>>> > > >>     cluster-reuters.sh ->1 [works]
>>> > > >>
>>> > >  cluster-reuters.sh ->2 [works]
>>> > > >>     cluster-reuters.sh ->3 [works]
>>> > > >>
>>> > > >>     Same error as noted previosly in the thread:
>>> > > >>
>>> > > >>     cluster-reuters.sh ->4 [0 clusters]
>>> > > >>
>>> > > >>     [...]
>>> > > >>
>>> > > >>     WARNING: No qualcluster.props found on classpath, will use
>>> > > >> command-line arguments only
>>> > > >>     Num clusters: 0; maxDistance: 0.000000
>>> > > >>     [Dunn Index]
>>> > > >>  First: Infinity
>>> > > >>     [Davies-Bouldin Index] First: NaN
>>> > > >>     Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > > >>     INFO: Program took 669 ms (Minutes: 0.01115)
>>> > > >>     cluster,distance.mean,distance.sd
>>> > > >>
>>> > >
>>> > >
>>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
>>> > > >> > From: suneel_marthi@yahoo.com
>>> > > >> > Subject: MAHOUT 0.9 Release - New URL
>>> > > >> > To: user@mahout.apache.org; dev@mahout.apache.org
>>> > > >> >
>>> > > >> > Third time's a Charm!!!
>>> > > >> >
>>> > > >> >
>>> > > >> > Here's the new URL for Mahout 0.9 Release:
>>> > > >> >
>>> > > >>
>>> > >
>>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>>> > > >> >
>>> > > >> > For those volunteering to test this, some of the things to be
>>> > > verified:
>>> > > >> >
>>> > > >> > a) Verify that u can unpack the release (tar or zip)
>>> > > >> > b) Verify u r able to compile the distro
>>> > > >> > c)  Run through the unit tests: mvn clean test
>>> > > >> > d) Run the example scripts
>>> > > >>  under $MAHOUT_HOME/examples/bin. Please run through all the
>>> different
>>> > > >> options in each script.
>>> > > >> >
>>> > > >> >
>>> > > >> > Committers
>>> > > >> >  and PMC members:
>>> > > >> > ---------------------------------------
>>> > > >> >
>>> > > >> > Need 'at least 3 +1 votes' for the Release to pass.
>>> > > >> >
>>> > > >> >
>>> > > >> > Thanks and
>>> > >  Regards.
>>> > > >>
>>> > > >
>>> > > >
>>> > >
>>>
>>
>>
>

Re: MAHOUT 0.9 Release - New URL

Posted by Andrew Musselman <an...@gmail.com>.
*cluster-reuters.sh*
*k-means:*

[snip]
:VL-19482{n=913 c=[0.06:0.011, 0.1:0.007, 0.13:0.010, 0.25:0.016,
0.38:0.020, 0.4:0.007, 0.5:0.032, 0
        Top Terms:
                banks                                   =>
3.841823268955143
                bank                                    =>
 3.80633066361209
                debt                                    =>
 3.28065219870794
                said                                    =>
 2.5965700942088583
                he                                      =>
2.335682813857497
                foreign                                 =>
 2.2217853688201403
                billion                                 =>
 2.1970193848291335
                would                                   =>
 1.9932392063955617
                loans                                   =>
 1.9309276792854233
                interest                                =>
 1.787324501938
                have                                    =>
1.762981951432578
                its                                     =>
 1.7615109954971866
                which                                   =>
 1.5822081148036862
                has                                     =>
 1.5600708189041956
                dlrs                                    =>
 1.5571038313005996
                finance                                 =>
 1.5539758811252924
                new                                     =>
 1.5176015811577555
                had                                     =>
 1.5138723701401844
                brazil                                  =>
 1.5083369853593172
                payments                                =>
 1.4539044255886517
        Weight : [props - optional]:  Point:

:VL-7320{n=2726 c=[0:0.003, 0.1:0.010, 0.2:0.007, 0.3:0.009, 0.4:0.007,
0.40:0.003, 0.5:0.009, 0.57:0
        Top Terms:
                vs                                      =>
6.126130791333171
                net                                     =>
4.012191567277523
                cts                                     =>
3.822006848832744
                shr                                     =>
 3.6786004856764527
                mln                                     =>
 2.9011643584038698
                loss                                    =>
2.788368861463607
                qtr                                     =>
2.714140225051522
                revs                                    =>
 2.4739861236454717
                profit                                  =>
 1.8146888090247015
                note                                    =>
 1.7977163272138388
                dlrs                                    =>
 1.6164390808155846
                avg                                     =>
 1.3901765773336587
                shrs                                    =>
 1.3856326531419314
                mths                                    =>
 1.3168717272038506
                4th                                     =>
 1.2161158425617289
                oper                                    =>
1.182419473776814
                year                                    =>
1.178086061733047
                nine                                    =>
 1.0670554836445316
                3rd                                     =>
1.041334410056592
                inc                                     =>
 1.0019361981554935
        Weight : [props - optional]:  Point:


Inter-Cluster Density: 0.45562152681859414
Intra-Cluster Density: 0.6952712632167628
CDbw Inter-Cluster Density: 0.0
CDbw Intra-Cluster Density: 16.486930227598684
CDbw Separation: 194.49005884464628

*fuzzy k-means:*
:SV-18539{n=1039 c=[0:0.026, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
0.01:0.005, 0.02:0.002, 0.0
        Top Terms:
                said                                    =>
 1.8665592354713065
                its                                     =>
 1.1335212213411592
                pct                                     =>
 1.0862816801353348
                dlrs                                    =>
 1.0854998884993752
                mln                                     =>
1.043163996400643
                from                                    =>
 0.9684961110525736
                has                                     =>
0.912161511978058
                company                                 =>
 0.8754186972808333
                mar                                     =>
 0.8675333452422878
                inc                                     =>
 0.7678617590362815
                would                                   =>
 0.7610968883652675
                he                                      =>
 0.7459988770503974
                which                                   =>
 0.7435613119406804
                year                                    =>
 0.7302840632748394
                u.s                                     =>
 0.7281061062439116
                shares                                  =>
 0.7260764102983083
                corp                                    =>
 0.7179807367808658
                new                                     =>
 0.7044203783157115
                stock                                   =>
 0.6962010978721442
                have                                    =>
 0.6464265467298506
:SV-9431{n=1034 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
0.01:0.004, 0.02:0.002, 0.02
        Top Terms:
                said                                    =>
1.864911184196927
                dlrs                                    =>
1.199286689822081
                mln                                     =>
 1.1802134783562215
                pct                                     =>
 1.1529704214798124
                its                                     =>
 1.1184398851519701
                from                                    =>
1.016647848050332
                company                                 =>
0.894703604722841
                mar                                     =>
0.879986159541356
                has                                     =>
 0.8642799128491316
                year                                    =>
 0.8271823503717782
                inc                                     =>
 0.7871293745341424
                corp                                    =>
0.737705498468879
                which                                   =>
0.722975201852743
                would                                   =>
0.708000816484415
                u.s                                     =>
 0.7073294276173905
                billion                                 =>
 0.7055723996916351
                he                                      =>
 0.7042684217823294
                new                                     =>
 0.6834737905434939
                shares                                  =>
 0.6753327384172428
                stock                                   =>
 0.6576225144041699
:SV-4785{n=1044 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
0.01:0.006, 0.02:0.002, 0.02
        Top Terms:
                said                                    =>
 1.8796076179735086
                its                                     =>
1.172025965452378
                dlrs                                    =>
1.130422792460914
                pct                                     =>
1.082038255241358
                mln                                     =>
 1.0772146872767114
                company                                 =>
 0.9662235879639138
                from                                    =>
 0.9473172871605616
                has                                     =>
 0.9224712965830099
                mar                                     =>
 0.8769325856924421
                inc                                     =>
 0.8360245257169788
                shares                                  =>
 0.8334595641384324
                stock                                   =>
 0.7704621839612175
                corp                                    =>
 0.7682400250301806
                which                                   =>
 0.7389988207856137
                would                                   =>
 0.7339708917389389
                year                                    =>
 0.7088414843731325
                new                                     =>
 0.7038109468655172
                he                                      =>
 0.6993994455501005
                u.s                                     =>
 0.6772649147622415
                share                                   =>
 0.6241804830055171

*lda:*

[snip]
21539
{0.02:0.7071698119320297,0.01:0.09185661419250732,0.055:0.05559716236030052,0:0.04416624482186779,0.046:0.04110903741109505,0.10:0.036664417789754995,0.073:0.009543544335363714,0.006913:0.0050293341354450535,0.007050:0.004031353935388081,0.003:0.0019163134919350053}
21540
{0.04:0.4289840457590362,0.006913:0.3764552842292379,0.03:0.14201700033942147,0.025:0.031492533562460345,0.01:0.0057428959027208,0.0625:0.005563615996288134,0.007050:0.004410121345864598,0.02:0.003315679992182833,0.06:0.0010680133665222197,0.057:2.6908116668663575E-4}
21541
{0:0.6323790020346274,0.02:0.282038438551802,0.06:0.05929917592183956,0.046:0.021922159161082488,0.025:0.003123559949176154,0.1:6.163861804777446E-4,0.057:1.1513607281652563E-4,0.077:7.546641269535658E-5,0.05:6.020069105221075E-5,0.04:5.727546417602751E-5}
21542
{0.06:0.7916202902993545,0.003:0.12474538653424426,0.03:0.0516187501990535,0.073:0.013078206873952192,0.077:0.006874558859394474,0.1:0.004726257215175674,0.07:0.0029954110394770084,0.02:0.0015535221634909381,0.0625:8.057802584933225E-4,0.006913:5.349608183182145E-4}
21543
{0.03:0.8557703918728101,0.007050:0.0884696514495358,0.01:0.02386217742025169,0.055:0.01794750983209314,0.046:0.007244240662980594,0.02:0.0025021610305199344,0.1:0.0011795344808501283,0.0625:9.555718731181631E-4,0.077:8.384080940465597E-4,0.003:2.4480111447900804E-4}
21544
{0.006913:0.6497351267772595,0.03:0.2586364130498913,0.003:0.0845308091084703,0.05:0.003488921831506818,0.0625:0.0011991059579690732,0.06:5.301659535652308E-4,0.055:3.152013670552699E-4,0.025:3.11949418681945E-4,0.02:2.2346509541410544E-4,0.057:1.727741085612565E-4}
21545
{0.07:0.24937001680778675,0.03:0.22854680579732564,0.1:0.22068070811382484,0.05:0.21891358916881656,0.06:0.04971205462818302,0.046:0.029384231252419384,0.025:0.0014118858333346275,0.0625:0.0012321257282372393,0.073:3.1606402052550454E-4,0.04:1.1662985389192366E-4}
21546
{0.006913:0.652435612545713,0.073:0.24282539253890825,0.003:0.055020663101050894,0.046:0.01978650831133704,0.04:0.011699600233998459,0.02:0.006822518565048025,0.05:0.0042733514779345234,0.025:0.0020768247329981497,0.03:0.0015466188524926666,0:0.0010433788491090747}
21547
{0.1:0.935072008706917,0.077:0.040616382672055494,0.046:0.023138054104883225,0.06:3.42252302356326E-4,0.03:2.672247473681925E-4,0.007050:9.004419280457053E-5,0.006913:7.844271097106069E-5,0.04:7.198835322717653E-5,0.073:5.8017696474244885E-5,0.02:4.474316852571109E-5}
21548
{0.006913:0.4510141547325999,0.03:0.21472666934984877,0.07:0.15060570743672352,0.046:0.08216181853028293,0.05:0.07498297963542139,0.077:0.01401215532342401,0.04:0.006135722806477439,0.073:0.0031124469556872442,0.02:0.001327252389650958,0.025:4.5167404646311124E-4}
21549
{0.077:0.5249260290096315,0:0.27520186965742544,0.073:0.08959570186504386,0.057:0.05063809804337512,0.02:0.045313417578343,0.03:0.00856024297303885,0.06:0.0034039136814002697,0.07:8.611323331226122E-4,0.05:3.9387255234958607E-4,0.04:3.454752498776842E-4}
21550
{0.077:0.8319708524327014,0.007050:0.16001430652201168,0.0625:0.0024884528530674226,0.03:0.0024228964094551416,0.006913:0.0010048446080994332,0.025:5.974607168723063E-4,0.06:2.7741593377792194E-4,0.057:2.2975316663437597E-4,0.073:2.1769059479546598E-4,0.01:1.4945234676270913E-4}
21551
{0:0.43298549949426596,0.07:0.30407459462158964,0.0625:0.15744077685124136,0.01:0.06385977925647884,0.025:0.020205557109867888,0.04:0.013226123091680062,0.055:0.006501562964287696,0.073:5.871897881404074E-4,0.003:4.554836644848264E-4,0.05:2.0489216962221778E-4}
21552
{0.06:0.7383744333942458,0.02:0.07321126682102753,0.07:0.05910378841288311,0.10:0.056907223730939045,0:0.02739559786902668,0.055:0.02228913751272657,0.1:0.00943274247398869,0.073:0.007301445750018608,0.03:0.0027711985062277246,0.046:0.0022569760697531112}
21553
{0.006913:0.9383779606743132,0.07:0.018356074893823605,0.1:0.017572502072403694,0.025:0.011031848455352145,0.02:0.007110054905474798,0.10:0.0032390933761145377,0.01:0.0022676909091530165,0.06:6.35712654913703E-4,0.003:3.037608224814005E-4,0.0625:2.569989005512836E-4}
21554
{0.02:0.4740260172915081,0.03:0.3264871353578654,0.003:0.12383071192450323,0:0.0387913487693229,0.046:0.013922363892934853,0.01:0.01115832101415319,0.055:0.004438529958216764,0.006913:0.0031112721201723413,0.025:0.002399946628810436,0.073:6.131643125523258E-4}
21555
{0.01:0.42396888624053325,0.057:0.2876207419392007,0.055:0.13436747721404915,0.10:0.0923627978966975,0.05:0.030530372962547347,0.003:0.008404234144369286,0.006913:0.007605013189108045,0.007050:0.005108825532219537,0.0625:0.004651573366090424,0.025:0.002180181204608079}
21556
{0.02:0.9634685201058687,0.077:0.02816791882994464,0.003:0.007368001305747207,0.01:4.6299664334060455E-4,0.055:6.358811991095171E-5,0.06:5.5240076581247115E-5,0.03:5.231350505975146E-5,0.073:4.4773553407989E-5,0:3.6810457882911134E-5,0.006913:3.469265861451538E-5}
21557
{0.06:0.9993947051352264,0.05:2.53296407549323E-4,0.03:9.647254237540585E-5,0.04:4.3439270433017595E-5,0.006913:3.354681491698619E-5,0.046:2.8586727566416525E-5,0.007050:2.5584606074089293E-5,0.02:2.2477243875316502E-5,0:1.9194783598477564E-5,0.073:1.6900267481892075E-5}
21558
{0.06:0.9985073788555696,0.03:8.033091100049725E-4,0.02:1.1953766739610202E-4,0.1:9.257610652171745E-5,0.006913:6.903783269939522E-5,0.04:5.7960967280609926E-5,0.07:5.5767050956214925E-5,0.046:4.857027337508731E-5,0.007050:3.5964741541364354E-5,0.057:3.538185773175377E-5}
21559
{0.006913:0.5411158302162348,0.073:0.10595054605908563,0.04:0.10013413310674449,0.007050:0.08891690362990352,0.003:0.08028744789933502,0.03:0.07779866329563544,0.025:0.0024314950846438975,0.0625:0.0018111845757907532,0.077:5.019763638644379E-4,0.055:2.6705221435486376E-4}
21560
{0.06:0.9978884964462115,0.03:9.139179655096763E-4,0.02:6.066525825847198E-4,0.04:3.2209695809936266E-4,0.006913:7.970816960964983E-5,0.007050:5.58984012266038E-5,0.046:1.545206468939752E-5,0.077:1.5319891034946036E-5,0:1.5223193391279898E-5,0.073:1.1687759018690983E-5}
21561
{0.06:0.8704787441462888,0.007050:0.09131090501970876,0.0625:0.03778566273654969,0.046:8.111987924033587E-5,0.01:6.397000919080148E-5,0.1:4.979138875277178E-5,0.073:2.7942853854174412E-5,0.03:2.762357950066146E-5,0.077:2.6820742114533983E-5,0.025:2.6208487691114472E-5}
21562
{0.06:0.9625603310705717,0.02:0.01708933716171641,0.007050:0.011374975845817934,0.07:0.00482114999912725,0.057:0.003310531318631415,0.077:2.568457462720504E-4,0.025:1.3439876933758153E-4,0.1:1.0512433283405881E-4,0:8.737443941062146E-5,0.046:5.9284232059258864E-5}
21563
{0.06:0.9996809273982157,0.046:1.6325482120709976E-4,0.02:2.1084598024339765E-5,0.006913:1.7901242019979392E-5,0.04:1.3991162886383784E-5,0.03:1.3682157013015017E-5,0.077:1.3602492456590212E-5,0.007050:1.206712606141859E-5,0.1:1.0231842284246997E-5,0.057:7.704725064510759E-6}
21564
{0.06:0.9997861963976675,0.03:2.7140899197995276E-5,0.077:2.4456158311423386E-5,0.04:1.7952818233846462E-5,0.02:1.6334455815684533E-5,0.006913:1.4910843270220926E-5,0.073:1.4893553937733922E-5,0.1:1.2545291899719683E-5,0.007050:1.2337617858874285E-5,0.07:1.121302251254011E-5}
21565
{0.006913:0.5047832315246878,0.007050:0.2502505818382197,0.04:0.09937533960784072,0.03:0.05332716291468396,0.0625:0.035738976624857435,0.05:0.023139962103851885,0.1:0.01510786357969295,0.025:0.01294816540331917,0.06:0.003589347327961106,0.073:4.8607269023994543E-4}
21566
{0.03:0.865579490292393,0.073:0.065013560785593,0.077:0.056622094108767465,0.046:0.006811842330071251,0.057:0.0021561477140846267,0.01:0.0013149375957061502,0.04:7.456782721333958E-4,0.05:6.412995789267404E-4,0:2.107021879325011E-4,0.007050:1.8121393989190674E-4}
21567
{0.077:0.6528663315309344,0.03:0.2794089480653573,0.025:0.060214953606503134,0.003:0.0029851203917978303,0.01:0.002840784719750811,0.007050:5.508511345707982E-4,0.02:3.3395642786457786E-4,0.07:1.7452781529689483E-4,0.055:1.541344869853217E-4,0.046:1.0154945247629696E-4}
21568
{0.057:0.7925855379581803,0.03:0.2036907633660934,0.06:0.002283924010657722,0.046:0.0011125177332923534,0.10:6.706442563331911E-5,0.02:3.9070227131596934E-5,0.07:3.266094677087569E-5,0.1:2.3436639268605713E-5,0.077:2.318067689954084E-5,0.006913:2.273173553155518E-5}
21569
{0.06:0.7223525910216753,0.1:0.22280373045161775,0.04:0.04239924319412595,0.02:0.006529556700876843,0.007050:0.004661124794787862,0.10:2.017974034648702E-4,0.05:2.017516118028694E-4,0.025:1.583677755896652E-4,0.006913:1.1939663934259253E-4,0:8.736457986006156E-5}
21570
{0.073:0.5321098304788365,0.006913:0.3897385574581158,0.02:0.051125703897749404,0.077:0.012417023254098358,0.01:0.006491518762040415,0.03:0.005434610908750246,0.055:9.352347087701305E-4,0.06:5.239363525283659E-4,0.10:4.4114444135088393E-4,0.003:2.2360302221231853E-4}
21571
{0.06:0.9074255414695478,0.05:0.08971808812931319,0.02:0.0019124104766371694,0.1:2.581234320438502E-4,0.073:1.9041498001195312E-4,0.046:8.348355382104383E-5,0.006913:5.748062878632201E-5,0.057:4.8979196235823963E-5,0.04:4.4005272512340306E-5,0.03:4.1089625609562384E-5}
21572
{0.06:0.7216888928389846,0.04:0.1955684645266304,0.006913:0.08235454447065854,0.1:9.82899034505965E-5,0.046:4.9184577303445956E-5,0.05:4.021965070167039E-5,0.007050:3.081280652634891E-5,0.073:2.791145099471127E-5,0.02:1.95681856192452E-5,0:1.9037190007395713E-5}
21573
{0.05:0.8568971411565196,0.046:0.12909436895238377,0.06:0.012704234652048044,0.04:2.472784666357729E-4,0.055:2.1886698996361582E-4,0.1:1.1974451444757112E-4,0.0625:9.082357988309755E-5,0.07:9.030017229129562E-5,0.03:7.268045692763623E-5,0.073:6.607926928741721E-5}
21574
{0.046:0.5619466458628039,0.006913:0.184782367089353,0.0625:0.09726566772972363,0.003:0.09534816862353344,0.02:0.015692927163565275,0.073:0.015492017672231727,0.01:0.01411218625979968,0.007050:0.01037341031640615,0.055:0.0020124341216292752,0:0.0013828685922332715}
21575
{0.05:0.3167681189235041,0.06:0.2879333280436204,0.046:0.22584628506521745,0.003:0.15994241628395953,0.0625:0.006482994028630967,0.1:0.0016773104050919493,0.055:4.8725015996676173E-4,0.03:2.432893903382962E-4,0.025:1.6015914759364425E-4,0.006913:1.0021292427951807E-4}
21576
{0.077:0.2407816981967022,0.003:0.18594668103110193,0.10:0.14951775492012523,0.0625:0.09241152906714677,0.007050:0.08997645163280943,0.057:0.07102865286733068,0.055:0.048046579920457584,0.05:0.03776387140040494,0.073:0.033355081179026046,0.006913:0.015980584385115525}
21577
{0.06:0.9599074612361259,0.1:0.03694135499501186,0.04:0.0021941315048273186,0.057:4.7070027358666304E-4,0.02:1.8231606308803002E-4,0.0625:6.234942557920162E-5,0.006913:5.159011604129561E-5,0.046:3.9832812943124216E-5,0.007050:3.0380683715134534E-5,0.05:2.8920851352755496E-5}


*Streaming k-means:*

[snip]
INFO: Number of Centroids: 0
Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
WARNING: job_local23982482_0001
java.lang.IllegalArgumentException: Must have nonzero number of training
and test vectors. Asked for %.1f %% of %d vectors for test
[10.000000149011612, 0]
        at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
        at
org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
        at
org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
        at
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
        at
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
        at
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
        at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)

[snip]

WARNING: No qualcluster.props found on classpath, will use command-line
arguments only
Num clusters: 0; maxDistance: 0.000000
[Dunn Index] First: Infinity
[Davies-Bouldin Index] First: NaN
Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 535 ms (Minutes: 0.008916666666666666)
cluster,distance.mean,distance.sd
,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train


On Tue, Jan 21, 2014 at 1:47 PM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:

> *classify-20newsgroups.sh*
>
> *Complementary naive bayes:*
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances          :      11207       98.9406%
> Incorrectly Classified Instances        :        120        1.0594%
> Total Classified Instances              :      11327
>
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       b       c       d       e       f       g       h       i       j
>       k       l       m       n       o       p       q       r       s
>  t        <--Classified as
> 475     0       0       1       0       0       0       0       0       0
>       0       0       0       0       1       0       1       0       0
>  0         |  478         a     = alt.atheism
> 0       597     1       1       0       1       1       0       0       0
>       0       1       0       2       1       0       0       0       0
>  0         |  605         b     = comp.graphics
> 0       1       620     3       0       1       0       0       0       0
>       0       1       0       0       1       0       0       0       0
>  0         |  627         c     = comp.os.ms-windows.misc
> 1       1       1       593     2       0       0       0       0       0
>       0       0       0       0       0       1       0       0       0
>  0         |  599         d     = comp.sys.ibm.pc.hardware
> 0       1       1       0       568     0       1       0       0       0
>       1       1       2       0       0       0       0       1       0
>  0         |  576         e     = comp.sys.mac.hardware
> 0       4       2       0       0       581     0       0       0       0
>       0       0       0       0       0       0       0       0       0
>  0         |  587         f     = comp.windows.x
> 0       0       0       1       2       0       571     3       0       0
>       1       1       4       1       0       0       0       0       0
>  0         |  584         g     = misc.forsale
> 0       0       0       1       0       0       0       589     1       0
>       0       1       1       0       0       0       0       0       0
>  0         |  593         h     = rec.autos
> 0       0       0       0       0       0       0       1       565     0
>       0       0       0       0       1       0       0       0       0
>  0         |  567         i     = rec.motorcycles
> 0       0       0       0       0       0       0       0       0
> 600     2       0       0       0       1       0       0       0       0
>    0         |  603         j     = rec.sport.baseball
> 0       0       0       0       0       0       0       0       0       1
>       584     0       0       0       0       0       0       0       0
>  0         |  585         k     = rec.sport.hockey
> 0       0       0       0       0       0       0       0       0       0
>       0       579     0       0       0       0       0       1       0
>  0         |  580         l     = sci.crypt
> 0       0       0       1       3       0       2       0       0       2
>       0       0       567     1       2       1       0       0       0
>  0         |  579         m     = sci.electronics
> 0       0       0       0       0       0       0       0       0       0
>       0       0       1       605     0       0       0       0       0
>  0         |  606         n     = sci.med
> 0       0       0       0       0       0       0       0       0       0
>       0       0       0       0       602     0       0       0       0
>  0         |  602         o     = sci.space
> 0       0       0       0       0       0       0       0       0       0
>       0       0       0       1       0       602     0       0       1
>  0         |  604         p     = soc.religion.christian
> 0       0       0       0       0       0       0       0       0       0
>       0       0       0       0       0       0       556     0       0
>  0         |  556         q     = talk.politics.mideast
> 0       0       1       0       0       0       0       0       0       0
>       0       1       0       0       1       0       0       568     0
>  0         |  571         r     = talk.politics.guns
> 11      0       0       0       0       0       0       0       0       1
>       0       0       0       1       3       8       1       4       338
>  2         |  369         s     = talk.religion.misc
> 0       0       0       0       0       0       0       0       0       0
>       1       0       0       0       1       0       3       4       0
>  447       |  456         t     = talk.politics.misc
>
> =======================================================
> Statistics
> -------------------------------------------------------
> Kappa                                       0.9806
> Accuracy                                   98.9406%
> Reliability                                94.0932%
> Reliability (standard deviation)            0.2163
>
> Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 15870 ms (Minutes: 0.2645)
> + echo 'Testing on holdout set'
> Testing on holdout set
> + ./bin/mahout testnb -i /tmp/mahout-work-ec2-user/20news-test-vectors -m
> /tmp/mahout-work-ec2-user/model -l /tmp/mahout-work-ec2-user/labelindex -ow
> -o /tmp/mahout-work-ec2-user/20news-testing -c
>
> [snip]
>
> INFO: Complementary Results:
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances          :       6715       89.3071%
> Incorrectly Classified Instances        :        804       10.6929%
> Total Classified Instances              :       7519
>
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       b       c       d       e       f       g       h       i       j
>       k       l       m       n       o       p       q       r       s
>  t        <--Classified as
> 298     0       0       0       0       0       0       0       0       1
>       0       0       0       1       2       5       1       0       13
>   0         |  321         a     = alt.atheism
> 0       298     11      6       1       12      2       2       1       1
>       3       8       3       4       2       4       1       4       4
>  1         |  368         b     = comp.graphics
> 1       17      286     16      4       9       6       3       2       0
>       1       0       1       7       1       0       2       1       0
>  1         |  358         c     = comp.os.ms-windows.misc
> 2       6       11      309     9       5       14      8       1       0
>       2       0       6       4       2       0       1       2       1
>  0         |  383         d     = comp.sys.ibm.pc.hardware
> 0       10      8       7       334     7       5       5       2       0
>       3       0       2       1       1       0       1       1       0
>  0         |  387         e     = comp.sys.mac.hardware
> 1       13      7       8       2       355     2       0       2       0
>       0       5       1       1       3       0       0       1       0
>  0         |  401         f     = comp.windows.x
> 0       7       11      29      12      9       268     16      8       4
>       3       2       6       4       2       1       3       1       2
>  3         |  391         g     = misc.forsale
> 0       1       0       0       3       0       7       362     8       2
>       2       1       2       0       2       0       1       2       0
>  4         |  397         h     = rec.autos
> 0       0       0       1       0       0       1       0       423     0
>       0       0       2       1       0       1       0       0       0
>  0         |  429         i     = rec.motorcycles
> 0       0       1       0       0       0       0       2       2
> 371     8       0       2       3       0       2       0       0       0
>    0         |  391         j     = rec.sport.baseball
> 0       0       1       0       0       0       1       0       0       2
>       409     0       0       0       0       0       0       0       0
>  1         |  414         k     = rec.sport.hockey
> 0       0       1       2       1       0       1       0       0       0
>       0       404     0       0       0       0       0       1       0
>  1         |  411         l     = sci.crypt
> 0       5       4       11      1       3       7       9       2       5
>       3       3       339     2       6       0       1       1       2
>  1         |  405         m     = sci.electronics
> 0       4       0       1       0       0       0       1       0       1
>       1       0       3       367     3       1       2       0       0
>  0         |  384         n     = sci.med
> 0       1       2       0       1       0       2       0       0       1
>       0       0       1       1       375     0       1       0       0
>  0         |  385         o     = sci.space
> 4       2       1       1       0       0       1       1       2       0
>       0       1       1       5       1       367     4       0       1
>  1         |  393         p     = soc.religion.christian
> 0       1       0       0       0       0       0       0       0       2
>       0       0       0       0       0       2       378     0       1
>  0         |  384         q     = talk.politics.mideast
> 0       0       0       0       0       2       1       1       1       1
>       0       3       0       3       0       0       2       319     2
>  4         |  339         r     = talk.politics.guns
> 32      0       0       1       0       0       0       0       0       1
>       1       1       0       2       2       26      5       7       175
>  6         |  259         s     = talk.religion.misc
> 0       0       0       2       0       0       0       0       0       1
>       2       2       0       1       2       1       10      18      2
>  278       |  319         t     = talk.politics.misc
>
> =======================================================
> Statistics
> -------------------------------------------------------
> Kappa                                       0.8594
> Accuracy                                   89.3071%
> Reliability                                 84.611%
> Reliability (standard deviation)            0.2148
>
> Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 10840 ms (Minutes: 0.18066666666666667)
>
>
> *Naive bayes:*
> INFO: Standard NB Results:
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances          :      11286       99.0869%
> Incorrectly Classified Instances        :        104        0.9131%
> Total Classified Instances              :      11390
>
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       b       c       d       e       f       g       h       i       j
>       k       l       m       n       o       p       q       r       s
>  t        <--Classified as
> 474     0       0       0       0       0       0       0       0       0
>       0       0       0       0       0       0       0       0       2
>  1         |  477         a     = alt.atheism
> 0       566     0       2       0       1       0       0       0       0
>       0       0       0       0       0       0       0       0       0
>  0         |  569         b     = comp.graphics
> 0       10      590     29      2       4       1       0       0       0
>       0       0       1       0       0       0       0       0       0
>  1         |  638         c     = comp.os.ms-windows.misc
> 0       0       0       596     0       0       0       0       0       0
>       0       0       0       0       0       0       0       0       0
>  0         |  596         d     = comp.sys.ibm.pc.hardware
> 0       0       0       0       575     0       1       0       0       0
>       0       0       1       0       0       0       0       0       0
>  0         |  577         e     = comp.sys.mac.hardware
> 0       2       2       2       0       593     1       0       0       0
>       0       0       0       0       1       0       0       0       0
>  0         |  601         f     = comp.windows.x
> 0       0       0       1       0       0       589     1       0       0
>       1       0       2       0       0       0       0       0       0
>  0         |  594         g     = misc.forsale
> 0       0       0       0       0       0       0       594     0       0
>       0       0       0       0       0       0       0       0       0
>  0         |  594         h     = rec.autos
> 0       0       0       0       0       0       0       0       611     0
>       0       0       0       0       0       0       0       0       0
>  0         |  611         i     = rec.motorcycles
> 0       0       0       0       0       0       0       0       0
> 616     1       0       0       0       0       0       0       0       0
>    0         |  617         j     = rec.sport.baseball
> 0       0       0       0       0       0       1       0       0       0
>       620     0       0       0       0       0       0       0       0
>  0         |  621         k     = rec.sport.hockey
> 0       0       0       0       0       0       0       0       0       0
>       0       580     0       0       0       0       0       1       0
>  0         |  581         l     = sci.crypt
> 0       0       0       3       1       0       0       0       0       0
>       0       0       571     0       0       0       0       0       0
>  0         |  575         m     = sci.electronics
> 0       0       0       0       0       0       0       0       0       0
>       0       0       2       583     0       0       0       0       0
>  0         |  585         n     = sci.med
> 0       0       0       0       0       0       0       0       0       0
>       0       0       0       1       599     0       0       0       0
>  0         |  600         o     = sci.space
> 0       1       0       0       0       0       0       0       0       0
>       0       0       0       0       0       615     0       0       0
>  0         |  616         p     = soc.religion.christian
> 1       0       0       0       0       0       0       0       0       0
>       0       0       0       0       0       1       560     0       0
>  0         |  562         q     = talk.politics.mideast
> 0       0       1       0       0       0       0       0       0       0
>       0       1       0       0       0       0       0       548     0
>  1         |  551         r     = talk.politics.guns
> 10      0       0       0       0       0       0       0       0       0
>       0       0       0       0       1       1       0       2       344
>  1         |  359         s     = talk.religion.misc
> 0       0       0       0       0       0       0       0       0       0
>       0       1       1       0       0       0       0       2       0
>  462       |  466         t     = talk.politics.misc
>
> =======================================================
> Statistics
> -------------------------------------------------------
> Kappa                                       0.9847
> Accuracy                                   99.0869%
> Reliability                                94.3334%
> Reliability (standard deviation)            0.2169
>
> Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 14304 ms (Minutes: 0.2384)
> + echo 'Testing on holdout set'
> Testing on holdout set
>
> [snip]
>
> INFO: Standard NB Results:
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances          :       6718       90.1019%
> Incorrectly Classified Instances        :        738        9.8981%
> Total Classified Instances              :       7456
>
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       b       c       d       e       f       g       h       i       j
>       k       l       m       n       o       p       q       r       s
>  t        <--Classified as
> 294     0       0       0       0       0       0       0       0       0
>       0       2       0       1       1       6       1       1       16
>   0         |  322         a     = alt.atheism
> 0       345     6       14      6       11      6       0       0       0
>       0       5       7       1       3       0       0       0       0
>  0         |  404         b     = comp.graphics
> 2       29      177     78      22      19      9       1       0       0
>       0       4       2       0       1       1       0       0       1
>  1         |  347         c     = comp.os.ms-windows.misc
> 1       9       2       335     18      2       10      0       0       0
>       1       0       8       0       0       0       0       0       0
>  0         |  386         d     = comp.sys.ibm.pc.hardware
> 1       4       2       13      347     3       5       1       0       0
>       1       0       7       1       0       0       0       1       0
>  0         |  386         e     = comp.sys.mac.hardware
> 0       20      0       4       0       352     4       0       0       0
>       0       0       1       1       3       0       1       0       1
>  0         |  387         f     = comp.windows.x
> 0       2       0       21      5       1       323     7       2       2
>       0       2       12      0       3       0       0       0       0
>  1         |  381         g     = misc.forsale
> 0       1       0       0       1       0       15      363     8       1
>       0       0       4       1       0       0       0       1       0
>  1         |  396         h     = rec.autos
> 0       1       0       0       0       0       6       6       370     0
>       0       0       0       1       0       0       0       0       1
>  0         |  385         i     = rec.motorcycles
> 1       0       0       1       1       0       2       1       2
> 362     5       0       2       0       0       0       0       0       0
>    0         |  377         j     = rec.sport.baseball
> 0       0       0       1       2       0       0       0       0       3
>       371     0       0       0       0       0       0       0       0
>  1         |  378         k     = rec.sport.hockey
> 0       3       1       0       1       0       2       0       0       0
>       0       396     0       1       0       0       1       1       1
>  3         |  410         l     = sci.crypt
> 0       7       0       7       7       2       6       4       0       0
>       0       1       369     2       2       0       0       0       0
>  2         |  409         m     = sci.electronics
> 0       3       0       2       1       0       2       0       0       0
>       0       1       4       383     4       0       0       1       0
>  4         |  405         n     = sci.med
> 0       5       0       0       1       0       3       0       0       0
>       0       0       1       0       374     1       0       0       1
>  1         |  387         o     = sci.space
> 6       2       0       1       1       0       0       1       0       1
>       0       0       1       5       0       352     2       1       7
>  1         |  381         p     = soc.religion.christian
> 1       1       0       0       0       0       0       0       0       0
>       1       0       0       0       0       0       373     1       0
>  1         |  378         q     = talk.politics.mideast
> 0       0       0       0       0       0       1       0       1       0
>       0       2       0       0       0       0       0       346     2
>  7         |  359         r     = talk.politics.guns
> 26      1       0       1       0       0       0       2       0       1
>       1       0       0       1       1       20      2       6       200
>  7         |  269         s     = talk.religion.misc
> 1       0       0       0       0       0       0       2       0       0
>       1       0       0       2       2       0       1       14      0
>  286       |  309         t     = talk.politics.misc
>
> =======================================================
> Statistics
> -------------------------------------------------------
> Kappa                                       0.8726
> Accuracy                                   90.1019%
> Reliability                                85.4491%
> Reliability (standard deviation)            0.2222
>
> Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 10878 ms (Minutes: 0.1813)
>
> *SGD:*
> 7532 test files
>
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances          :       5649            75%
> Incorrectly Classified Instances        :       1883            25%
> Total Classified Instances              :       7532
>
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       b       c       d       e       f       g       h       i       j
>       k       l       m       n       o       p       q       r       s
>  t        <--Classified as
> 186     6       3       10      5       0       33      4       13      15
>      7       1       24      15      3       15      5       5       29
> 15        |  394         a     = sci.space
> 5       309     0       3       2       5       0       0       0       1
>       9       21      2       0       0       18      4       4       1
>  1         |  385         b     = comp.sys.mac.hardware
> 4       1       101     3       0       1       63      0       7       0
>       1       1       5       16      3       0       3       7       1
>  34        |  251         c     = talk.religion.misc
> 11      12      1       265     1       10      3       0       0       17
>      10      11      5       2       0       11      3       6       21
> 0         |  389         d     = comp.graphics
> 2       1       1       0       349     2       3       0       3       2
>       6       1       5       1       0       2       15      2       1
>  2         |  398         e     = rec.motorcycles
> 7       20      3       19      2       254     6       0       2       11
>      2       39      7       2       0       4       2       2       9
>  3         |  394         f     = comp.os.ms-windows.misc
> 2       1       13      0       0       0       247     0       1       1
>       3       0       6       2       4       0       2       3       5
>  29        |  319         g     = alt.atheism
> 1       1       0       0       2       0       2       361     0       1
>       2       0       2       0       0       1       3       22      0
>  1         |  399         h     = rec.sport.hockey
> 3       0       3       1       0       0       5       0       161     0
>       1       2       12      102     0       0       1       2       11
>   6         |  310         i     = talk.politics.misc
> 2       8       0       19      0       19      0       0       1
> 294     10      11      4       2       0       5       0       3       11
>     6         |  395         j     = comp.windows.x
> 2       10      0       1       1       0       0       0       0       1
>       347     13      2       1       0       5       3       2       2
>  0         |  390         k     = misc.forsale
> 1       36      0       6       1       25      0       0       1       6
>       10      257     2       1       0       34      6       0       6
>  0         |  392         l     = comp.sys.ibm.pc.hardware
> 2       2       2       2       1       0       12      0       0       6
>       10      4       312     5       2       13      11      3       3
>  6         |  396         m     = sci.med
> 2       0       3       2       1       0       0       1       13      0
>       5       1       2       314     2       0       2       2       10
>   4         |  364         n     = talk.politics.guns
> 1       0       2       1       1       0       34      1       33      1
>       3       0       1       8       271     1       4       5       6
>  3         |  376         o     = talk.politics.mideast
> 3       14      0       8       2       8       3       1       1       7
>       12      29      6       2       1       245     13      2       32
>   4         |  393         p     = sci.electronics
> 3       3       0       2       11      0       1       0       2       1
>       11      6       4       2       0       11      330     4       4
>  1         |  396         q     = rec.autos
> 0       0       1       0       1       0       4       12      3       1
>       3       0       0       0       0       5       6       359     1
>  1         |  397         r     = rec.sport.baseball
> 0       1       0       0       0       1       0       0       3       3
>       0       0       3       2       1       6       1       6       366
>  3         |  396         s     = sci.crypt
> 0       2       11      1       1       0       40      0       1       2
>       3       4       2       1       0       5       0       2       2
>  321       |  398         t     = soc.religion.christian
>
> =======================================================
> Statistics
> -------------------------------------------------------
> Kappa                                       0.7073
> Accuracy                                        75%
> Reliability                                70.6238%
> Reliability (standard deviation)            0.2187
> Log-likelihood                mean      :    -1.1182
>                               25%-ile   :    -1.6911
>                               75%-ile   :    -0.0803
>
> Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 10783 ms (Minutes: 0.17971666666666666)
>
>
>
>
> On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <su...@yahoo.com>wrote:
>
>> Thanks Andrew for reporting that. I rolled back the release to fix this
>> and few other issues.
>>
>> We have removed asf-examples*.sh from trunk as the sample file at the url
>> mentioned in ur email is not available.
>> This is something we need to fix and restore in 1.0.
>>
>>
>>
>>
>>
>>
>>
>> On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <ap...@outlook.com>
>> wrote:
>>
>> from the asf-email-examples.sh script:
>>
>> # You will need to download or otherwise obtain some or all of the Amazon
>> ASF Em
>> ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566) to
>> use this
>> script.
>> # To obtain a full copy you will need to launch an EC2 instance and mount
>> the da
>> taset to download it, otherwise you can get a sample of it at
>> #
>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>>
>> It looks like the:
>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>>
>> link is down.
>>
>> Is there somewhere else that we can get a subset of the ASF emails?
>>
>>
>>
>> Date: Tue, 21 Jan 2014 09:48:06 -0800
>> > Subject: Re: MAHOUT 0.9 Release - New URL
>> > From: andrew.musselman@gmail.com
>> > To: dev@mahout.apache.org
>> >
>> > Sure thing; continuing to smoke test the other examples tonight
>> >
>> >
>> > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <suneel_marthi@yahoo.com
>> >wrote:
>> >
>> > > Thanks Andrew M., see that some of the example scripts need to be
>> fixed as
>> > > they still refer to the deprecated algorithms.
>> > > See that the Streaming KMeans has failed for you as well.
>> > >
>> > > I'll be rolling back the release today to fix these issues.
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
>> > > andrew.musselman@gmail.com> wrote:
>> > >
>> > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default
>> 64-bit
>> > > Linux AMI from tarball.
>> > >
>> > > All tests pass.
>> > >
>> > > *Output of examples:*
>> > > *asf-email-examples.sh, run on mahout.apache.org
>> > > <http://mahout.apache.org>:*
>> > > *recommendations:*
>> > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
>> > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000  | less
>> > > 1
>> > >
>> > >
>> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
>> > > 4
>> > >
>> > >
>> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
>> > > 6
>> > >
>> > >
>> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
>> > > 8
>> > >     [12758:1.0,19409:1.0,11112:1.0]
>> > > 11
>> > >
>> > >
>> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
>> > > 14
>> > >
>> > >
>> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
>> > > 15
>> > >
>> > >
>> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
>> > > 16
>> > >
>> > >
>> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
>> > > 18
>> > >
>> > >
>> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
>> > > 19      [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
>> > > 20
>> > >
>> > >
>> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
>> > > [snip]
>> > >
>> > > *clustering; kmeans:*
>> > > [snip]
>> > >         Weight : [props - optional]:  Point:
>> > >         1.0 :
>> > >  [distance-squared=1.0193102046188427]:
>> > > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus =
>> > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
>> 7573:0.204,
>> > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093,
>> 9779:0.159,
>> > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
>> > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
>> > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
>> > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
>> > > 39789:0.110, 40743:0.190, 45775:0.086]
>> > >         1.0 : [distance-squared=0.9823018320457279]:
>> > > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus=
>> > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
>> 5336:0.106,
>> > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173,
>> 7832:0.072,
>> > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
>> > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
>> > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
>> > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
>> > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
>> > >         1.0 : [distance-squared=0.9509142993214911]:
>> > > /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor=
>> > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
>> > >  4419:0.076,
>> > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056,
>> 7235:0.048,
>> > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123,
>> 7683:0.077,
>> > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
>> 10225:0.081,
>> > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
>> > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
>> > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
>> > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
>> > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
>> > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
>> > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
>> > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
>> > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
>> > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
>> > > 41280:0.065, 41696:0.072, 41947:0.118,
>> > >  43685:0.086, 44077:0.308,
>> > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
>> > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
>> > > [snip]
>> > >
>> > > *clustering; dirichlet:*
>> > > Get this complaint:
>> > > Running Dirichlet with K = 8
>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
>> > > HADOOP_CONF_DIR=
>> > > MAHOUT-JOB:
>> > >
>> > >
>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class:
>> dirichlet
>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props found
>> on
>> > > classpath, will use command-line arguments only
>> > > Unknown program 'dirichlet' chosen.
>> > >
>> > > *clustering: minhash:*
>> > > Running Minhash
>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
>> > > HADOOP_CONF_DIR=
>> > > MAHOUT-JOB:
>> > >
>> > >
>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>> > > 14/01/21 05:17:27 WARN
>> > >  driver.MahoutDriver: Unable to add class: minhash
>> > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found on
>> > > classpath, will use command-line arguments only
>> > > Unknown program 'minhash' chosen.
>> > >
>> > > *classification; standard:*
>> > > =======================================================
>> > > Summary
>> > > -------------------------------------------------------
>> > > Correctly Classified Instances          :       5384       87.7874%
>> > > Incorrectly Classified Instances        :        749       12.2126%
>> > > Total Classified Instances              :       6133
>> > >
>> > > =======================================================
>> > > Confusion Matrix
>> > > -------------------------------------------------------
>> > > a       b       c       d
>> > >     <--Classified as
>> > > 2949    7       531     25       |  3512        a     = dev
>> > > 0       0       0       0        |  0           b     = general
>> > > 99      8       1763    8        |  1878        c     = user
>> > > 41      1       29      672      |  743         d     = commits
>> > >
>> > > =======================================================
>> > > Statistics
>> > > -------------------------------------------------------
>> > > Kappa
>> > >  0.7877
>> > > Accuracy                                   87.7874%
>> > > Reliability                                 53.658%
>> > > Reliability (standard deviation)            0.4911
>> > >
>> > > *classification; complementary:*
>> > > =======================================================
>> > > Summary
>> > > -------------------------------------------------------
>> > > Correctly Classified Instances          :       5530       90.1679%
>> > > Incorrectly Classified Instances        :        603        9.8321%
>> > > Total Classified Instances              :
>> > >  6133
>> > >
>> > > =======================================================
>> > > Confusion Matrix
>> > > -------------------------------------------------------
>> > > a       b       c       d       <--Classified as
>> > > 3168    0       276     68       |  3512        a     = dev
>> > > 0       0       0       0        |  0           b     = general
>> > > 196     0       1652    30       |  1878        c     = user
>> > > 25      0       8       710      |  743         d     =
>> > >  commits
>> > >
>> > > =======================================================
>> > > Statistics
>> > > -------------------------------------------------------
>> > > Kappa                                       0.8259
>> > > Accuracy                                   90.1679%
>> > > Reliability                                54.7459%
>> > > Reliability (standard deviation)            0.5005
>> > >
>> > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms
>> (Minutes:
>> > > 0.34836666666666666)
>> > >
>> > > *classification; sgd, with three categories:*
>> > > Running SGD Training
>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
>> > >  and
>> > > HADOOP_CONF_DIR=
>> > > MAHOUT-JOB:
>> > >
>> > >
>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>> > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
>> > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
>> classpath,
>> > > will use command-line arguments only
>> > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
>> > > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
>> > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
>> > > --output=[asf-output/classification/sgd/models], --poolSize=[5],
>> > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
>> > > 24168 training files
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       1
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>> > >  2
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       3
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       4
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       6
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       8
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       10
>> > > 0.000   0.00    none
>> > > 0.00    0.00
>> > >    0.00    0.00    0.0000000       0.0000000       12
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       15
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       20
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       25
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       30
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000
>> > >     0.0000000       40
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       50
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       60
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       70
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       80
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       100
>> > > 0.000
>> > >  0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       120
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       140
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       150
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       200
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       250
>> > > 0.000   0.00    none
>> > > 0.00    0.00
>> > >  0.00    0.00    0.0000000       0.0000000       300
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       400
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       500
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       600
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       700
>> > > 0.000   0.00    none
>> > > 0.00    0.00    0.00    0.00    0.0000000
>> > >  0.0000000       800
>> > > 0.000   0.00    none
>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
>> > > 1.0019413e-08   1000    -0.607  75.78   none
>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
>> > > 1.0019413e-08   1200    -0.607  75.78   none
>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
>> > > 1.0019413e-08   1400    -0.607  75.78   none
>> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
>> > > 1.0019413e-08   1500    -0.607  75.78   none
>> > > 0.24    43686.00        17924.00        329.50
>> > >  1.0571799e-08
>> > > 1.0032261e-08   2000    -0.487  82.65   none
>> > > 0.24    49753.00        21610.00        330.71  1.3770070e-08
>> > > 1.0011902e-08   2500    -0.439  83.90   none
>> > > 0.24    49753.00        21610.00        330.71  1.3770070e-08
>> > > 1.0011902e-08   3000    -0.439  83.90   none
>> > > 0.32    50635.00        28531.00        437.09  1.0551175e-08
>> > > 1.0000001e-08   4000    -0.351  88.14   none
>> > > 0.32    50635.00        32642.00        437.09  1.0551175e-08
>> > > 1.0000000e-08   5000    -0.378  87.10   none
>> > > 0.32    50635.00        36461.00        437.09
>> > >  1.0556652e-08
>> > > 1.0000001e-08   6000    -0.372  86.89   none
>> > > 0.32    50635.00        37768.00        437.09  1.0576742e-08
>> > > 1.0000001e-08   7000    -0.334  89.26   none
>> > > 0.32    50635.00        38807.00        437.09  1.0576742e-08
>> > > 1.0000000e-08   8000    -0.368  87.52   none
>> > > 0.32    50635.00        44731.00        437.09  1.0576716e-08
>> > > 1.0000000e-08   10000   -0.374  87.39   none
>> > > 0.32    50635.00        45672.00        437.09  1.0576716e-08
>> > > 1.0000000e-08   12000   -0.298  88.26   none
>> > > Exception in thread "main" java.lang.IllegalStateException:
>> > > java.lang.ArrayIndexOutOfBoundsException:
>> > >  2
>> > >         at
>> > >
>> > >
>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
>> > >         at
>> > >
>> > >
>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
>> > >         at
>> > >
>> > >
>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
>> > >         at
>> > >
>> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
>> > >         at
>> > >
>> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
>> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > >         at
>> > >
>> > >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> > >
>> > >  at
>> > >
>> > >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> > >         at java.lang.reflect.Method.invoke(Method.java:622)
>> > >         at
>> > >
>> > >
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>> > >         at
>> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>> > >         at
>> > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > >         at
>> > >
>> > >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> > >         at
>> > >
>> > >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> > >         at java.lang.reflect.Method.invoke(Method.java:622)
>> > >         at
>> > >  org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
>> > >         at
>> > > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
>> > >         at
>> > >
>> > >
>> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
>> > >         at
>> > >
>> > >
>> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
>> > >         at
>> > >
>> > >
>> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
>> > >         at
>> > >
>> > >
>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
>> > >         at
>> > >
>> > >
>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
>> > >
>> > >  at
>> > >
>> > >
>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
>> > >         at
>> > >
>> > >
>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
>> > >         at
>> > >
>> > >
>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
>> > >         at
>> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> > >         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> > >         at
>> > >
>> > >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>> > >         at
>> > >
>> > >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> > >         at java.lang.Thread.run(Thread.java:701)
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
>> > > andrew.musselman@gmail.com> wrote:
>> > >
>> > > > Trying out the build today
>> > > >
>> > > >
>> > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
>> suneel_marthi@yahoo.com
>> > > >wrote:
>> > > >
>> > > >> This is an issue (trivial one though) that needs to be fixed for
>> 0.9
>> > > >> Release, will be rerolling the release today (in the next few hrs)
>> and
>> > > >> putting out a new release candidate in staging.
>> > > >>
>> > > >> Thanks for reporting this Andrew P.
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
>> > > ap.dev@outlook.com>
>> > > >> wrote:
>> > > >>
>> > > >> I ran through the tests with on a CentOS VM
>> > >  AMD64 2 cores 4 GB RAM.  Had
>> > > >> a bit of trouble getting the Hadoop natives to compile and
>> therefore may
>> > > >> have run into some problems because of the hadoop setup.  Ran into
>> some
>> > > >> problems in the example scripts.  Particularly with
>> > > >> ./cluster-syntheticcontrol.sh ->4,5.  I will run through the rest
>> of the
>> > > >> examples when im sure I've got hadoop setup right.
>> > > >>
>> > > >>
>> > > >> Apache Maven 3.1.2-SNAPSHOT
>> > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
>> > > >> Java home: /usr/java/jdk1.6.0_45/jre
>> > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch:
>> "amd64",
>> > > >> family: "unix"
>> > > >> $MAHOUT_LOCAL=true
>> > > >> Hadoop 2.2.0
>> > > >>
>> > > >>
>> > > >> a) Verify that u can unpack the release (tar or zip) ...passed
>> (tar)
>> > > >> [passed ]
>> > > >>
>> > > >> b) Verify u r able to compile the
>> > >  distro
>> > > >>
>> > > >>     mvn compile- [passed with warnings]
>> > > >>
>> > > >>     [WARNING]  Expected all dependencies to require Scala version:
>> 2.9.3
>> > > >>     [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires
>> scala
>> > > >> version: 2.9.3
>> > > >>     [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala
>> > > >> version: 2.9.2
>> > > >>     [WARNING] Multiple versions of scala libraries detected!
>> > > >>
>> > > >> c)  Run through the unit tests: mvn clean test
>> > > >>     mvn clean test [passed]
>> > > >>
>> > > >> d) Run the
>> > > >>  example scripts under $MAHOUT_HOME/examples/bin.
>> > > >> Please run through all the different options in each script
>> > > >>
>> > > >>     Running example scripts with $MAHOUT_LOCAL=true
>> > > >>
>> > > >>
>> > >  ./cluster-syntheticcontrol.sh ->1 [works]
>> > > >>     ./cluster-syntheticcontrol.sh ->2 [works]
>> > > >>     ./cluster-syntheticcontrol.sh ->3 [works]
>> > > >>
>> > > >>
>> > > >>     ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
>> > > >>     [...]
>> > > >>     WARNING: Unable to add class:
>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>> > > >>     java.lang.ClassNotFoundException:
>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>> > > >>         at
>> > > >>  java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>> > > >>         at java.security.AccessController.doPrivileged(Native
>> Method)
>> > > >>         at
>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>> > > >>         at
>> > >  java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>> > > >>         at
>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>> > > >>         at java.lang.Class.forName0(Native Method)
>> > > >>         at java.lang.Class.forName(Class.java:171)
>> > > >>         at
>> > > >>
>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>> > > >>         at
>> > > >>  org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>> > > >>     Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
>> > > >>
>> > > >>
>> > > >>     ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
>> > > >>
>> > > >>     WARNING: Unable to add class:
>> > > >>
>> > >  org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>> > > >>     java.lang.ClassNotFoundException:
>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>> > > >>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>> > > >>         at java.security.AccessController.doPrivileged(Native
>> Method)
>> > > >>         at
>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>> > > >>         at
>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>> > > >>         at java.lang.Class.forName0(Native Method)
>> > > >>         at
>> > >  java.lang.Class.forName(Class.java:171)
>> > > >>         at
>> > > >>
>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>> > > >>         at
>> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>> > > >>     Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
>> > > >>     WARNING: No
>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props
>> found
>> > > on
>> > > >> classpath, will use command-line arguments only
>> > > >>     Unknown program
>> > > >>  'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
>> chosen.
>> > > >>
>> > > >>
>> > > >>     ./classify-20newsgroups.sh ->1 [works]
>> > > >>     ./classify-20newsgroups.sh ->2 [works]
>> > > >>
>> > > >>
>> > > >>     cluster-reuters.sh ->1 [works]
>> > > >>
>> > >  cluster-reuters.sh ->2 [works]
>> > > >>     cluster-reuters.sh ->3 [works]
>> > > >>
>> > > >>     Same error as noted previosly in the thread:
>> > > >>
>> > > >>     cluster-reuters.sh ->4 [0 clusters]
>> > > >>
>> > > >>     [...]
>> > > >>
>> > > >>     WARNING: No qualcluster.props found on classpath, will use
>> > > >> command-line arguments only
>> > > >>     Num clusters: 0; maxDistance: 0.000000
>> > > >>     [Dunn Index]
>> > > >>  First: Infinity
>> > > >>     [Davies-Bouldin Index] First: NaN
>> > > >>     Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
>> > > >>     INFO: Program took 669 ms (Minutes: 0.01115)
>> > > >>     cluster,distance.mean,distance.sd
>> > > >>
>> > >
>> > >
>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
>> > > >> > From: suneel_marthi@yahoo.com
>> > > >> > Subject: MAHOUT 0.9 Release - New URL
>> > > >> > To: user@mahout.apache.org; dev@mahout.apache.org
>> > > >> >
>> > > >> > Third time's a Charm!!!
>> > > >> >
>> > > >> >
>> > > >> > Here's the new URL for Mahout 0.9 Release:
>> > > >> >
>> > > >>
>> > >
>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>> > > >> >
>> > > >> > For those volunteering to test this, some of the things to be
>> > > verified:
>> > > >> >
>> > > >> > a) Verify that u can unpack the release (tar or zip)
>> > > >> > b) Verify u r able to compile the distro
>> > > >> > c)  Run through the unit tests: mvn clean test
>> > > >> > d) Run the example scripts
>> > > >>  under $MAHOUT_HOME/examples/bin. Please run through all the
>> different
>> > > >> options in each script.
>> > > >> >
>> > > >> >
>> > > >> > Committers
>> > > >> >  and PMC members:
>> > > >> > ---------------------------------------
>> > > >> >
>> > > >> > Need 'at least 3 +1 votes' for the Release to pass.
>> > > >> >
>> > > >> >
>> > > >> > Thanks and
>> > >  Regards.
>> > > >>
>> > > >
>> > > >
>> > >
>>
>
>

Re: MAHOUT 0.9 Release - New URL

Posted by Andrew Musselman <an...@gmail.com>.
*classify-20newsgroups.sh*

*Complementary naive bayes:*
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :      11207       98.9406%
Incorrectly Classified Instances        :        120        1.0594%
Total Classified Instances              :      11327

=======================================================
Confusion Matrix
-------------------------------------------------------
a       b       c       d       e       f       g       h       i       j
    k       l       m       n       o       p       q       r       s
 t        <--Classified as
475     0       0       1       0       0       0       0       0       0
    0       0       0       0       1       0       1       0       0
 0         |  478         a     = alt.atheism
0       597     1       1       0       1       1       0       0       0
    0       1       0       2       1       0       0       0       0
 0         |  605         b     = comp.graphics
0       1       620     3       0       1       0       0       0       0
    0       1       0       0       1       0       0       0       0
 0         |  627         c     = comp.os.ms-windows.misc
1       1       1       593     2       0       0       0       0       0
    0       0       0       0       0       1       0       0       0
 0         |  599         d     = comp.sys.ibm.pc.hardware
0       1       1       0       568     0       1       0       0       0
    1       1       2       0       0       0       0       1       0
 0         |  576         e     = comp.sys.mac.hardware
0       4       2       0       0       581     0       0       0       0
    0       0       0       0       0       0       0       0       0
 0         |  587         f     = comp.windows.x
0       0       0       1       2       0       571     3       0       0
    1       1       4       1       0       0       0       0       0
 0         |  584         g     = misc.forsale
0       0       0       1       0       0       0       589     1       0
    0       1       1       0       0       0       0       0       0
 0         |  593         h     = rec.autos
0       0       0       0       0       0       0       1       565     0
    0       0       0       0       1       0       0       0       0
 0         |  567         i     = rec.motorcycles
0       0       0       0       0       0       0       0       0       600
    2       0       0       0       1       0       0       0       0
 0         |  603         j     = rec.sport.baseball
0       0       0       0       0       0       0       0       0       1
    584     0       0       0       0       0       0       0       0
 0         |  585         k     = rec.sport.hockey
0       0       0       0       0       0       0       0       0       0
    0       579     0       0       0       0       0       1       0
 0         |  580         l     = sci.crypt
0       0       0       1       3       0       2       0       0       2
    0       0       567     1       2       1       0       0       0
 0         |  579         m     = sci.electronics
0       0       0       0       0       0       0       0       0       0
    0       0       1       605     0       0       0       0       0
 0         |  606         n     = sci.med
0       0       0       0       0       0       0       0       0       0
    0       0       0       0       602     0       0       0       0
 0         |  602         o     = sci.space
0       0       0       0       0       0       0       0       0       0
    0       0       0       1       0       602     0       0       1
 0         |  604         p     = soc.religion.christian
0       0       0       0       0       0       0       0       0       0
    0       0       0       0       0       0       556     0       0
 0         |  556         q     = talk.politics.mideast
0       0       1       0       0       0       0       0       0       0
    0       1       0       0       1       0       0       568     0
 0         |  571         r     = talk.politics.guns
11      0       0       0       0       0       0       0       0       1
    0       0       0       1       3       8       1       4       338
 2         |  369         s     = talk.religion.misc
0       0       0       0       0       0       0       0       0       0
    1       0       0       0       1       0       3       4       0
 447       |  456         t     = talk.politics.misc

=======================================================
Statistics
-------------------------------------------------------
Kappa                                       0.9806
Accuracy                                   98.9406%
Reliability                                94.0932%
Reliability (standard deviation)            0.2163

Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 15870 ms (Minutes: 0.2645)
+ echo 'Testing on holdout set'
Testing on holdout set
+ ./bin/mahout testnb -i /tmp/mahout-work-ec2-user/20news-test-vectors -m
/tmp/mahout-work-ec2-user/model -l /tmp/mahout-work-ec2-user/labelindex -ow
-o /tmp/mahout-work-ec2-user/20news-testing -c

[snip]

INFO: Complementary Results:
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :       6715       89.3071%
Incorrectly Classified Instances        :        804       10.6929%
Total Classified Instances              :       7519

=======================================================
Confusion Matrix
-------------------------------------------------------
a       b       c       d       e       f       g       h       i       j
    k       l       m       n       o       p       q       r       s
 t        <--Classified as
298     0       0       0       0       0       0       0       0       1
    0       0       0       1       2       5       1       0       13
0         |  321         a     = alt.atheism
0       298     11      6       1       12      2       2       1       1
    3       8       3       4       2       4       1       4       4
 1         |  368         b     = comp.graphics
1       17      286     16      4       9       6       3       2       0
    1       0       1       7       1       0       2       1       0
 1         |  358         c     = comp.os.ms-windows.misc
2       6       11      309     9       5       14      8       1       0
    2       0       6       4       2       0       1       2       1
 0         |  383         d     = comp.sys.ibm.pc.hardware
0       10      8       7       334     7       5       5       2       0
    3       0       2       1       1       0       1       1       0
 0         |  387         e     = comp.sys.mac.hardware
1       13      7       8       2       355     2       0       2       0
    0       5       1       1       3       0       0       1       0
 0         |  401         f     = comp.windows.x
0       7       11      29      12      9       268     16      8       4
    3       2       6       4       2       1       3       1       2
 3         |  391         g     = misc.forsale
0       1       0       0       3       0       7       362     8       2
    2       1       2       0       2       0       1       2       0
 4         |  397         h     = rec.autos
0       0       0       1       0       0       1       0       423     0
    0       0       2       1       0       1       0       0       0
 0         |  429         i     = rec.motorcycles
0       0       1       0       0       0       0       2       2       371
    8       0       2       3       0       2       0       0       0
 0         |  391         j     = rec.sport.baseball
0       0       1       0       0       0       1       0       0       2
    409     0       0       0       0       0       0       0       0
 1         |  414         k     = rec.sport.hockey
0       0       1       2       1       0       1       0       0       0
    0       404     0       0       0       0       0       1       0
 1         |  411         l     = sci.crypt
0       5       4       11      1       3       7       9       2       5
    3       3       339     2       6       0       1       1       2
 1         |  405         m     = sci.electronics
0       4       0       1       0       0       0       1       0       1
    1       0       3       367     3       1       2       0       0
 0         |  384         n     = sci.med
0       1       2       0       1       0       2       0       0       1
    0       0       1       1       375     0       1       0       0
 0         |  385         o     = sci.space
4       2       1       1       0       0       1       1       2       0
    0       1       1       5       1       367     4       0       1
 1         |  393         p     = soc.religion.christian
0       1       0       0       0       0       0       0       0       2
    0       0       0       0       0       2       378     0       1
 0         |  384         q     = talk.politics.mideast
0       0       0       0       0       2       1       1       1       1
    0       3       0       3       0       0       2       319     2
 4         |  339         r     = talk.politics.guns
32      0       0       1       0       0       0       0       0       1
    1       1       0       2       2       26      5       7       175
 6         |  259         s     = talk.religion.misc
0       0       0       2       0       0       0       0       0       1
    2       2       0       1       2       1       10      18      2
 278       |  319         t     = talk.politics.misc

=======================================================
Statistics
-------------------------------------------------------
Kappa                                       0.8594
Accuracy                                   89.3071%
Reliability                                 84.611%
Reliability (standard deviation)            0.2148

Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 10840 ms (Minutes: 0.18066666666666667)


*Naive bayes:*
INFO: Standard NB Results:
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :      11286       99.0869%
Incorrectly Classified Instances        :        104        0.9131%
Total Classified Instances              :      11390

=======================================================
Confusion Matrix
-------------------------------------------------------
a       b       c       d       e       f       g       h       i       j
    k       l       m       n       o       p       q       r       s
 t        <--Classified as
474     0       0       0       0       0       0       0       0       0
    0       0       0       0       0       0       0       0       2
 1         |  477         a     = alt.atheism
0       566     0       2       0       1       0       0       0       0
    0       0       0       0       0       0       0       0       0
 0         |  569         b     = comp.graphics
0       10      590     29      2       4       1       0       0       0
    0       0       1       0       0       0       0       0       0
 1         |  638         c     = comp.os.ms-windows.misc
0       0       0       596     0       0       0       0       0       0
    0       0       0       0       0       0       0       0       0
 0         |  596         d     = comp.sys.ibm.pc.hardware
0       0       0       0       575     0       1       0       0       0
    0       0       1       0       0       0       0       0       0
 0         |  577         e     = comp.sys.mac.hardware
0       2       2       2       0       593     1       0       0       0
    0       0       0       0       1       0       0       0       0
 0         |  601         f     = comp.windows.x
0       0       0       1       0       0       589     1       0       0
    1       0       2       0       0       0       0       0       0
 0         |  594         g     = misc.forsale
0       0       0       0       0       0       0       594     0       0
    0       0       0       0       0       0       0       0       0
 0         |  594         h     = rec.autos
0       0       0       0       0       0       0       0       611     0
    0       0       0       0       0       0       0       0       0
 0         |  611         i     = rec.motorcycles
0       0       0       0       0       0       0       0       0       616
    1       0       0       0       0       0       0       0       0
 0         |  617         j     = rec.sport.baseball
0       0       0       0       0       0       1       0       0       0
    620     0       0       0       0       0       0       0       0
 0         |  621         k     = rec.sport.hockey
0       0       0       0       0       0       0       0       0       0
    0       580     0       0       0       0       0       1       0
 0         |  581         l     = sci.crypt
0       0       0       3       1       0       0       0       0       0
    0       0       571     0       0       0       0       0       0
 0         |  575         m     = sci.electronics
0       0       0       0       0       0       0       0       0       0
    0       0       2       583     0       0       0       0       0
 0         |  585         n     = sci.med
0       0       0       0       0       0       0       0       0       0
    0       0       0       1       599     0       0       0       0
 0         |  600         o     = sci.space
0       1       0       0       0       0       0       0       0       0
    0       0       0       0       0       615     0       0       0
 0         |  616         p     = soc.religion.christian
1       0       0       0       0       0       0       0       0       0
    0       0       0       0       0       1       560     0       0
 0         |  562         q     = talk.politics.mideast
0       0       1       0       0       0       0       0       0       0
    0       1       0       0       0       0       0       548     0
 1         |  551         r     = talk.politics.guns
10      0       0       0       0       0       0       0       0       0
    0       0       0       0       1       1       0       2       344
 1         |  359         s     = talk.religion.misc
0       0       0       0       0       0       0       0       0       0
    0       1       1       0       0       0       0       2       0
 462       |  466         t     = talk.politics.misc

=======================================================
Statistics
-------------------------------------------------------
Kappa                                       0.9847
Accuracy                                   99.0869%
Reliability                                94.3334%
Reliability (standard deviation)            0.2169

Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 14304 ms (Minutes: 0.2384)
+ echo 'Testing on holdout set'
Testing on holdout set

[snip]

INFO: Standard NB Results:
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :       6718       90.1019%
Incorrectly Classified Instances        :        738        9.8981%
Total Classified Instances              :       7456

=======================================================
Confusion Matrix
-------------------------------------------------------
a       b       c       d       e       f       g       h       i       j
    k       l       m       n       o       p       q       r       s
 t        <--Classified as
294     0       0       0       0       0       0       0       0       0
    0       2       0       1       1       6       1       1       16
0         |  322         a     = alt.atheism
0       345     6       14      6       11      6       0       0       0
    0       5       7       1       3       0       0       0       0
 0         |  404         b     = comp.graphics
2       29      177     78      22      19      9       1       0       0
    0       4       2       0       1       1       0       0       1
 1         |  347         c     = comp.os.ms-windows.misc
1       9       2       335     18      2       10      0       0       0
    1       0       8       0       0       0       0       0       0
 0         |  386         d     = comp.sys.ibm.pc.hardware
1       4       2       13      347     3       5       1       0       0
    1       0       7       1       0       0       0       1       0
 0         |  386         e     = comp.sys.mac.hardware
0       20      0       4       0       352     4       0       0       0
    0       0       1       1       3       0       1       0       1
 0         |  387         f     = comp.windows.x
0       2       0       21      5       1       323     7       2       2
    0       2       12      0       3       0       0       0       0
 1         |  381         g     = misc.forsale
0       1       0       0       1       0       15      363     8       1
    0       0       4       1       0       0       0       1       0
 1         |  396         h     = rec.autos
0       1       0       0       0       0       6       6       370     0
    0       0       0       1       0       0       0       0       1
 0         |  385         i     = rec.motorcycles
1       0       0       1       1       0       2       1       2       362
    5       0       2       0       0       0       0       0       0
 0         |  377         j     = rec.sport.baseball
0       0       0       1       2       0       0       0       0       3
    371     0       0       0       0       0       0       0       0
 1         |  378         k     = rec.sport.hockey
0       3       1       0       1       0       2       0       0       0
    0       396     0       1       0       0       1       1       1
 3         |  410         l     = sci.crypt
0       7       0       7       7       2       6       4       0       0
    0       1       369     2       2       0       0       0       0
 2         |  409         m     = sci.electronics
0       3       0       2       1       0       2       0       0       0
    0       1       4       383     4       0       0       1       0
 4         |  405         n     = sci.med
0       5       0       0       1       0       3       0       0       0
    0       0       1       0       374     1       0       0       1
 1         |  387         o     = sci.space
6       2       0       1       1       0       0       1       0       1
    0       0       1       5       0       352     2       1       7
 1         |  381         p     = soc.religion.christian
1       1       0       0       0       0       0       0       0       0
    1       0       0       0       0       0       373     1       0
 1         |  378         q     = talk.politics.mideast
0       0       0       0       0       0       1       0       1       0
    0       2       0       0       0       0       0       346     2
 7         |  359         r     = talk.politics.guns
26      1       0       1       0       0       0       2       0       1
    1       0       0       1       1       20      2       6       200
 7         |  269         s     = talk.religion.misc
1       0       0       0       0       0       0       2       0       0
    1       0       0       2       2       0       1       14      0
 286       |  309         t     = talk.politics.misc

=======================================================
Statistics
-------------------------------------------------------
Kappa                                       0.8726
Accuracy                                   90.1019%
Reliability                                85.4491%
Reliability (standard deviation)            0.2222

Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 10878 ms (Minutes: 0.1813)

*SGD:*
7532 test files

=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :       5649            75%
Incorrectly Classified Instances        :       1883            25%
Total Classified Instances              :       7532

=======================================================
Confusion Matrix
-------------------------------------------------------
a       b       c       d       e       f       g       h       i       j
    k       l       m       n       o       p       q       r       s
 t        <--Classified as
186     6       3       10      5       0       33      4       13      15
     7       1       24      15      3       15      5       5       29
15        |  394         a     = sci.space
5       309     0       3       2       5       0       0       0       1
    9       21      2       0       0       18      4       4       1
 1         |  385         b     = comp.sys.mac.hardware
4       1       101     3       0       1       63      0       7       0
    1       1       5       16      3       0       3       7       1
 34        |  251         c     = talk.religion.misc
11      12      1       265     1       10      3       0       0       17
     10      11      5       2       0       11      3       6       21
0         |  389         d     = comp.graphics
2       1       1       0       349     2       3       0       3       2
    6       1       5       1       0       2       15      2       1
 2         |  398         e     = rec.motorcycles
7       20      3       19      2       254     6       0       2       11
     2       39      7       2       0       4       2       2       9
 3         |  394         f     = comp.os.ms-windows.misc
2       1       13      0       0       0       247     0       1       1
    3       0       6       2       4       0       2       3       5
 29        |  319         g     = alt.atheism
1       1       0       0       2       0       2       361     0       1
    2       0       2       0       0       1       3       22      0
 1         |  399         h     = rec.sport.hockey
3       0       3       1       0       0       5       0       161     0
    1       2       12      102     0       0       1       2       11
6         |  310         i     = talk.politics.misc
2       8       0       19      0       19      0       0       1       294
    10      11      4       2       0       5       0       3       11
6         |  395         j     = comp.windows.x
2       10      0       1       1       0       0       0       0       1
    347     13      2       1       0       5       3       2       2
 0         |  390         k     = misc.forsale
1       36      0       6       1       25      0       0       1       6
    10      257     2       1       0       34      6       0       6
 0         |  392         l     = comp.sys.ibm.pc.hardware
2       2       2       2       1       0       12      0       0       6
    10      4       312     5       2       13      11      3       3
 6         |  396         m     = sci.med
2       0       3       2       1       0       0       1       13      0
    5       1       2       314     2       0       2       2       10
4         |  364         n     = talk.politics.guns
1       0       2       1       1       0       34      1       33      1
    3       0       1       8       271     1       4       5       6
 3         |  376         o     = talk.politics.mideast
3       14      0       8       2       8       3       1       1       7
    12      29      6       2       1       245     13      2       32
4         |  393         p     = sci.electronics
3       3       0       2       11      0       1       0       2       1
    11      6       4       2       0       11      330     4       4
 1         |  396         q     = rec.autos
0       0       1       0       1       0       4       12      3       1
    3       0       0       0       0       5       6       359     1
 1         |  397         r     = rec.sport.baseball
0       1       0       0       0       1       0       0       3       3
    0       0       3       2       1       6       1       6       366
 3         |  396         s     = sci.crypt
0       2       11      1       1       0       40      0       1       2
    3       4       2       1       0       5       0       2       2
 321       |  398         t     = soc.religion.christian

=======================================================
Statistics
-------------------------------------------------------
Kappa                                       0.7073
Accuracy                                        75%
Reliability                                70.6238%
Reliability (standard deviation)            0.2187
Log-likelihood                mean      :    -1.1182
                              25%-ile   :    -1.6911
                              75%-ile   :    -0.0803

Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 10783 ms (Minutes: 0.17971666666666666)




On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <su...@yahoo.com>wrote:

> Thanks Andrew for reporting that. I rolled back the release to fix this
> and few other issues.
>
> We have removed asf-examples*.sh from trunk as the sample file at the url
> mentioned in ur email is not available.
> This is something we need to fix and restore in 1.0.
>
>
>
>
>
>
>
> On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <ap...@outlook.com>
> wrote:
>
> from the asf-email-examples.sh script:
>
> # You will need to download or otherwise obtain some or all of the Amazon
> ASF Em
> ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566) to
> use this
> script.
> # To obtain a full copy you will need to launch an EC2 instance and mount
> the da
> taset to download it, otherwise you can get a sample of it at
> #
> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>
> It looks like the:
> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>
> link is down.
>
> Is there somewhere else that we can get a subset of the ASF emails?
>
>
>
> Date: Tue, 21 Jan 2014 09:48:06 -0800
> > Subject: Re: MAHOUT 0.9 Release - New URL
> > From: andrew.musselman@gmail.com
> > To: dev@mahout.apache.org
> >
> > Sure thing; continuing to smoke test the other examples tonight
> >
> >
> > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <suneel_marthi@yahoo.com
> >wrote:
> >
> > > Thanks Andrew M., see that some of the example scripts need to be
> fixed as
> > > they still refer to the deprecated algorithms.
> > > See that the Streaming KMeans has failed for you as well.
> > >
> > > I'll be rolling back the release today to fix these issues.
> > >
> > >
> > >
> > >
> > >
> > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
> > > andrew.musselman@gmail.com> wrote:
> > >
> > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default
> 64-bit
> > > Linux AMI from tarball.
> > >
> > > All tests pass.
> > >
> > > *Output of examples:*
> > > *asf-email-examples.sh, run on mahout.apache.org
> > > <http://mahout.apache.org>:*
> > > *recommendations:*
> > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
> > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000  | less
> > > 1
> > >
> > >
> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
> > > 4
> > >
> > >
> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
> > > 6
> > >
> > >
> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
> > > 8
> > >     [12758:1.0,19409:1.0,11112:1.0]
> > > 11
> > >
> > >
> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
> > > 14
> > >
> > >
> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
> > > 15
> > >
> > >
> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
> > > 16
> > >
> > >
> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
> > > 18
> > >
> > >
> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
> > > 19      [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
> > > 20
> > >
> > >
> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
> > > [snip]
> > >
> > > *clustering; kmeans:*
> > > [snip]
> > >         Weight : [props - optional]:  Point:
> > >         1.0 :
> > >  [distance-squared=1.0193102046188427]:
> > > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus =
> > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
> 7573:0.204,
> > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093, 9779:0.159,
> > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
> > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
> > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
> > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
> > > 39789:0.110, 40743:0.190, 45775:0.086]
> > >         1.0 : [distance-squared=0.9823018320457279]:
> > > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus =
> > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
> 5336:0.106,
> > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173, 7832:0.072,
> > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
> > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
> > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
> > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
> > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
> > >         1.0 : [distance-squared=0.9509142993214911]:
> > > /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor=
> > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
> > >  4419:0.076,
> > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056, 7235:0.048,
> > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123, 7683:0.077,
> > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
> 10225:0.081,
> > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
> > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
> > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
> > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
> > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
> > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
> > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
> > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
> > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
> > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
> > > 41280:0.065, 41696:0.072, 41947:0.118,
> > >  43685:0.086, 44077:0.308,
> > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
> > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
> > > [snip]
> > >
> > > *clustering; dirichlet:*
> > > Get this complaint:
> > > Running Dirichlet with K = 8
> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > > HADOOP_CONF_DIR=
> > > MAHOUT-JOB:
> > >
> > >
> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class:
> dirichlet
> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props found on
> > > classpath, will use command-line arguments only
> > > Unknown program 'dirichlet' chosen.
> > >
> > > *clustering: minhash:*
> > > Running Minhash
> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > > HADOOP_CONF_DIR=
> > > MAHOUT-JOB:
> > >
> > >
> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > > 14/01/21 05:17:27 WARN
> > >  driver.MahoutDriver: Unable to add class: minhash
> > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found on
> > > classpath, will use command-line arguments only
> > > Unknown program 'minhash' chosen.
> > >
> > > *classification; standard:*
> > > =======================================================
> > > Summary
> > > -------------------------------------------------------
> > > Correctly Classified Instances          :       5384       87.7874%
> > > Incorrectly Classified Instances        :        749       12.2126%
> > > Total Classified Instances              :       6133
> > >
> > > =======================================================
> > > Confusion Matrix
> > > -------------------------------------------------------
> > > a       b       c       d
> > >     <--Classified as
> > > 2949    7       531     25       |  3512        a     = dev
> > > 0       0       0       0        |  0           b     = general
> > > 99      8       1763    8        |  1878        c     = user
> > > 41      1       29      672      |  743         d     = commits
> > >
> > > =======================================================
> > > Statistics
> > > -------------------------------------------------------
> > > Kappa
> > >  0.7877
> > > Accuracy                                   87.7874%
> > > Reliability                                 53.658%
> > > Reliability (standard deviation)            0.4911
> > >
> > > *classification; complementary:*
> > > =======================================================
> > > Summary
> > > -------------------------------------------------------
> > > Correctly Classified Instances          :       5530       90.1679%
> > > Incorrectly Classified Instances        :        603        9.8321%
> > > Total Classified Instances              :
> > >  6133
> > >
> > > =======================================================
> > > Confusion Matrix
> > > -------------------------------------------------------
> > > a       b       c       d       <--Classified as
> > > 3168    0       276     68       |  3512        a     = dev
> > > 0       0       0       0        |  0           b     = general
> > > 196     0       1652    30       |  1878        c     = user
> > > 25      0       8       710      |  743         d     =
> > >  commits
> > >
> > > =======================================================
> > > Statistics
> > > -------------------------------------------------------
> > > Kappa                                       0.8259
> > > Accuracy                                   90.1679%
> > > Reliability                                54.7459%
> > > Reliability (standard deviation)            0.5005
> > >
> > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms
> (Minutes:
> > > 0.34836666666666666)
> > >
> > > *classification; sgd, with three categories:*
> > > Running SGD Training
> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
> > >  and
> > > HADOOP_CONF_DIR=
> > > MAHOUT-JOB:
> > >
> > >
> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
> > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
> classpath,
> > > will use command-line arguments only
> > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
> > > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
> > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
> > > --output=[asf-output/classification/sgd/models], --poolSize=[5],
> > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
> > > 24168 training files
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       1
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
> > >  2
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       3
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       4
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       6
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       8
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       10
> > > 0.000   0.00    none
> > > 0.00    0.00
> > >    0.00    0.00    0.0000000       0.0000000       12
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       15
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       20
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       25
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       30
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000
> > >     0.0000000       40
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       50
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       60
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       70
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       80
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       100
> > > 0.000
> > >  0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       120
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       140
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       150
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       200
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       250
> > > 0.000   0.00    none
> > > 0.00    0.00
> > >  0.00    0.00    0.0000000       0.0000000       300
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       400
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       500
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       600
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       700
> > > 0.000   0.00    none
> > > 0.00    0.00    0.00    0.00    0.0000000
> > >  0.0000000       800
> > > 0.000   0.00    none
> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > > 1.0019413e-08   1000    -0.607  75.78   none
> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > > 1.0019413e-08   1200    -0.607  75.78   none
> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > > 1.0019413e-08   1400    -0.607  75.78   none
> > > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > > 1.0019413e-08   1500    -0.607  75.78   none
> > > 0.24    43686.00        17924.00        329.50
> > >  1.0571799e-08
> > > 1.0032261e-08   2000    -0.487  82.65   none
> > > 0.24    49753.00        21610.00        330.71  1.3770070e-08
> > > 1.0011902e-08   2500    -0.439  83.90   none
> > > 0.24    49753.00        21610.00        330.71  1.3770070e-08
> > > 1.0011902e-08   3000    -0.439  83.90   none
> > > 0.32    50635.00        28531.00        437.09  1.0551175e-08
> > > 1.0000001e-08   4000    -0.351  88.14   none
> > > 0.32    50635.00        32642.00        437.09  1.0551175e-08
> > > 1.0000000e-08   5000    -0.378  87.10   none
> > > 0.32    50635.00        36461.00        437.09
> > >  1.0556652e-08
> > > 1.0000001e-08   6000    -0.372  86.89   none
> > > 0.32    50635.00        37768.00        437.09  1.0576742e-08
> > > 1.0000001e-08   7000    -0.334  89.26   none
> > > 0.32    50635.00        38807.00        437.09  1.0576742e-08
> > > 1.0000000e-08   8000    -0.368  87.52   none
> > > 0.32    50635.00        44731.00        437.09  1.0576716e-08
> > > 1.0000000e-08   10000   -0.374  87.39   none
> > > 0.32    50635.00        45672.00        437.09  1.0576716e-08
> > > 1.0000000e-08   12000   -0.298  88.26   none
> > > Exception in thread "main" java.lang.IllegalStateException:
> > > java.lang.ArrayIndexOutOfBoundsException:
> > >  2
> > >         at
> > >
> > >
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
> > >         at
> > >
> > >
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
> > >         at
> > >
> > >
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
> > >         at
> > >
> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
> > >         at
> > >
> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >         at
> > >
> > >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > >
> > >  at
> > >
> > >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > >         at java.lang.reflect.Method.invoke(Method.java:622)
> > >         at
> > >
> > >
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > >         at
> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > >         at
> > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >         at
> > >
> > >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > >         at
> > >
> > >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > >         at java.lang.reflect.Method.invoke(Method.java:622)
> > >         at
> > >  org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
> > >         at
> > > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
> > >         at
> > >
> > >
> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
> > >         at
> > >
> > >
> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
> > >         at
> > >
> > >
> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
> > >         at
> > >
> > >
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
> > >         at
> > >
> > >
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
> > >
> > >  at
> > >
> > >
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
> > >         at
> > >
> > >
> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
> > >         at
> > >
> > >
> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
> > >         at
> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > >         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > >         at
> > >
> > >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > >         at
> > >
> > >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >         at java.lang.Thread.run(Thread.java:701)
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
> > > andrew.musselman@gmail.com> wrote:
> > >
> > > > Trying out the build today
> > > >
> > > >
> > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
> suneel_marthi@yahoo.com
> > > >wrote:
> > > >
> > > >> This is an issue (trivial one though) that needs to be fixed for 0.9
> > > >> Release, will be rerolling the release today (in the next few hrs)
> and
> > > >> putting out a new release candidate in staging.
> > > >>
> > > >> Thanks for reporting this Andrew P.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
> > > ap.dev@outlook.com>
> > > >> wrote:
> > > >>
> > > >> I ran through the tests with on a CentOS VM
> > >  AMD64 2 cores 4 GB RAM.  Had
> > > >> a bit of trouble getting the Hadoop natives to compile and
> therefore may
> > > >> have run into some problems because of the hadoop setup.  Ran into
> some
> > > >> problems in the example scripts.  Particularly with
> > > >> ./cluster-syntheticcontrol.sh ->4,5.  I will run through the rest
> of the
> > > >> examples when im sure I've got hadoop setup right.
> > > >>
> > > >>
> > > >> Apache Maven 3.1.2-SNAPSHOT
> > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> > > >> Java home: /usr/java/jdk1.6.0_45/jre
> > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch:
> "amd64",
> > > >> family: "unix"
> > > >> $MAHOUT_LOCAL=true
> > > >> Hadoop 2.2.0
> > > >>
> > > >>
> > > >> a) Verify that u can unpack the release (tar or zip) ...passed (tar)
> > > >> [passed ]
> > > >>
> > > >> b) Verify u r able to compile the
> > >  distro
> > > >>
> > > >>     mvn compile- [passed with warnings]
> > > >>
> > > >>     [WARNING]  Expected all dependencies to require Scala version:
> 2.9.3
> > > >>     [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires
> scala
> > > >> version: 2.9.3
> > > >>     [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala
> > > >> version: 2.9.2
> > > >>     [WARNING] Multiple versions of scala libraries detected!
> > > >>
> > > >> c)  Run through the unit tests: mvn clean test
> > > >>     mvn clean test [passed]
> > > >>
> > > >> d) Run the
> > > >>  example scripts under $MAHOUT_HOME/examples/bin.
> > > >> Please run through all the different options in each script
> > > >>
> > > >>     Running example scripts with $MAHOUT_LOCAL=true
> > > >>
> > > >>
> > >  ./cluster-syntheticcontrol.sh ->1 [works]
> > > >>     ./cluster-syntheticcontrol.sh ->2 [works]
> > > >>     ./cluster-syntheticcontrol.sh ->3 [works]
> > > >>
> > > >>
> > > >>     ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
> > > >>     [...]
> > > >>     WARNING: Unable to add class:
> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > > >>     java.lang.ClassNotFoundException:
> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > > >>         at
> > > >>  java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > > >>         at java.security.AccessController.doPrivileged(Native
> Method)
> > > >>         at
> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > > >>         at
> > >  java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > > >>         at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > > >>         at java.lang.Class.forName0(Native Method)
> > > >>         at java.lang.Class.forName(Class.java:171)
> > > >>         at
> > > >>
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > > >>         at
> > > >>  org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > > >>     Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
> > > >>
> > > >>
> > > >>     ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
> > > >>
> > > >>     WARNING: Unable to add class:
> > > >>
> > >  org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > > >>     java.lang.ClassNotFoundException:
> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > > >>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > > >>         at java.security.AccessController.doPrivileged(Native
> Method)
> > > >>         at
> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > > >>         at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > > >>         at java.lang.Class.forName0(Native Method)
> > > >>         at
> > >  java.lang.Class.forName(Class.java:171)
> > > >>         at
> > > >>
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > > >>         at
> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > > >>     Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
> > > >>     WARNING: No
> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props
> found
> > > on
> > > >> classpath, will use command-line arguments only
> > > >>     Unknown program
> > > >>  'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
> chosen.
> > > >>
> > > >>
> > > >>     ./classify-20newsgroups.sh ->1 [works]
> > > >>     ./classify-20newsgroups.sh ->2 [works]
> > > >>
> > > >>
> > > >>     cluster-reuters.sh ->1 [works]
> > > >>
> > >  cluster-reuters.sh ->2 [works]
> > > >>     cluster-reuters.sh ->3 [works]
> > > >>
> > > >>     Same error as noted previosly in the thread:
> > > >>
> > > >>     cluster-reuters.sh ->4 [0 clusters]
> > > >>
> > > >>     [...]
> > > >>
> > > >>     WARNING: No qualcluster.props found on classpath, will use
> > > >> command-line arguments only
> > > >>     Num clusters: 0; maxDistance: 0.000000
> > > >>     [Dunn Index]
> > > >>  First: Infinity
> > > >>     [Davies-Bouldin Index] First: NaN
> > > >>     Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
> > > >>     INFO: Program took 669 ms (Minutes: 0.01115)
> > > >>     cluster,distance.mean,distance.sd
> > > >>
> > >
> > >
> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
> > > >> > From: suneel_marthi@yahoo.com
> > > >> > Subject: MAHOUT 0.9 Release - New URL
> > > >> > To: user@mahout.apache.org; dev@mahout.apache.org
> > > >> >
> > > >> > Third time's a Charm!!!
> > > >> >
> > > >> >
> > > >> > Here's the new URL for Mahout 0.9 Release:
> > > >> >
> > > >>
> > >
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> > > >> >
> > > >> > For those volunteering to test this, some of the things to be
> > > verified:
> > > >> >
> > > >> > a) Verify that u can unpack the release (tar or zip)
> > > >> > b) Verify u r able to compile the distro
> > > >> > c)  Run through the unit tests: mvn clean test
> > > >> > d) Run the example scripts
> > > >>  under $MAHOUT_HOME/examples/bin. Please run through all the
> different
> > > >> options in each script.
> > > >> >
> > > >> >
> > > >> > Committers
> > > >> >  and PMC members:
> > > >> > ---------------------------------------
> > > >> >
> > > >> > Need 'at least 3 +1 votes' for the Release to pass.
> > > >> >
> > > >> >
> > > >> > Thanks and
> > >  Regards.
> > > >>
> > > >
> > > >
> > >
>

Re: MAHOUT 0.9 Release - New URL

Posted by Suneel Marthi <su...@yahoo.com>.
Thanks Andrew for reporting that. I rolled back the release to fix this and few other issues.

We have removed asf-examples*.sh from trunk as the sample file at the url mentioned in ur email is not available.
This is something we need to fix and restore in 1.0.







On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <ap...@outlook.com> wrote:
 
from the asf-email-examples.sh script:

# You will need to download or otherwise obtain some or all of the Amazon ASF Em
ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566) to use this
script.
# To obtain a full copy you will need to launch an EC2 instance and mount the da
taset to download it, otherwise you can get a sample of it at
# http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout

It looks like the:
http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout

link is down.  

Is there somewhere else that we can get a subset of the ASF emails?



Date: Tue, 21 Jan 2014 09:48:06 -0800
> Subject: Re: MAHOUT 0.9 Release - New URL
> From: andrew.musselman@gmail.com
> To: dev@mahout.apache.org
> 
> Sure thing; continuing to smoke test the other examples tonight
> 
> 
> On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <su...@yahoo.com>wrote:
> 
> > Thanks Andrew M., see that some of the example scripts need to be fixed as
> > they still refer to the deprecated algorithms.
> > See that the Streaming KMeans has failed for you as well.
> >
> > I'll be rolling back the release today to fix these issues.
> >
> >
> >
> >
> >
> > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
> > andrew.musselman@gmail.com> wrote:
> >
> > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default 64-bit
> > Linux AMI from tarball.
> >
> > All tests pass.
> >
> > *Output of examples:*
> > *asf-email-examples.sh, run on mahout.apache.org
> > <http://mahout.apache.org>:*
> > *recommendations:*
> > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
> > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000  | less
> > 1
> >
> > [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
> > 4
> >
> > [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
> > 6
> >
> > [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
> > 8
> >     [12758:1.0,19409:1.0,11112:1.0]
> > 11
> >
> > [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
> > 14
> >
> > [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
> > 15
> >
> > [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
> > 16
> >
> > [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
> > 18
> >
> > [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
> > 19      [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
> > 20
> >
> > [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
> > [snip]
> >
> > *clustering; kmeans:*
> > [snip]
> >         Weight : [props - optional]:  Point:
> >         1.0 :
> >  [distance-squared=1.0193102046188427]:
> > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus =
> > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110, 7573:0.204,
> > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093, 9779:0.159,
> > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
> > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
> > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
> > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
> > 39789:0.110, 40743:0.190, 45775:0.086]
> >         1.0 : [distance-squared=0.9823018320457279]:
> > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus =
> > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104, 5336:0.106,
> > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173, 7832:0.072,
> > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
> > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
> > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
> > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
> > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
> >         1.0 : [distance-squared=0.9509142993214911]:
> > /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
> > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
> >  4419:0.076,
> > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056, 7235:0.048,
> > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123, 7683:0.077,
> > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118, 10225:0.081,
> > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
> > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
> > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
> > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
> > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
> > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
> > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
> > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
> > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
> > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
> > 41280:0.065, 41696:0.072, 41947:0.118,
> >  43685:0.086, 44077:0.308,
> > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
> > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
> > [snip]
> >
> > *clustering; dirichlet:*
> > Get this complaint:
> > Running Dirichlet with K = 8
> > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > HADOOP_CONF_DIR=
> > MAHOUT-JOB:
> >
> > /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class: dirichlet
> > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props found on
> > classpath, will use command-line arguments only
> > Unknown program 'dirichlet' chosen.
> >
> > *clustering: minhash:*
> > Running Minhash
> > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > HADOOP_CONF_DIR=
> > MAHOUT-JOB:
> >
> > /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > 14/01/21 05:17:27 WARN
> >  driver.MahoutDriver: Unable to add class: minhash
> > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found on
> > classpath, will use command-line arguments only
> > Unknown program 'minhash' chosen.
> >
> > *classification; standard:*
> > =======================================================
> > Summary
> > -------------------------------------------------------
> > Correctly Classified Instances          :       5384       87.7874%
> > Incorrectly Classified Instances        :        749       12.2126%
> > Total Classified Instances              :       6133
> >
> > =======================================================
> > Confusion Matrix
> > -------------------------------------------------------
> > a       b       c       d
> >     <--Classified as
> > 2949    7       531     25       |  3512        a     = dev
> > 0       0       0       0        |  0           b     = general
> > 99      8       1763    8        |  1878        c     = user
> > 41      1       29      672      |  743         d     = commits
> >
> > =======================================================
> > Statistics
> > -------------------------------------------------------
> > Kappa
> >  0.7877
> > Accuracy                                   87.7874%
> > Reliability                                 53.658%
> > Reliability (standard deviation)            0.4911
> >
> > *classification; complementary:*
> > =======================================================
> > Summary
> > -------------------------------------------------------
> > Correctly Classified Instances          :       5530       90.1679%
> > Incorrectly Classified Instances        :        603        9.8321%
> > Total Classified Instances              :
> >  6133
> >
> > =======================================================
> > Confusion Matrix
> > -------------------------------------------------------
> > a       b       c       d       <--Classified as
> > 3168    0       276     68       |  3512        a     = dev
> > 0       0       0       0        |  0           b     = general
> > 196     0       1652    30       |  1878        c     = user
> > 25      0       8       710      |  743         d     =
> >  commits
> >
> > =======================================================
> > Statistics
> > -------------------------------------------------------
> > Kappa                                       0.8259
> > Accuracy                                   90.1679%
> > Reliability                                54.7459%
> > Reliability (standard deviation)            0.5005
> >
> > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms (Minutes:
> > 0.34836666666666666)
> >
> > *classification; sgd, with three categories:*
> > Running SGD Training
> > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
> >  and
> > HADOOP_CONF_DIR=
> > MAHOUT-JOB:
> >
> > /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
> > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on classpath,
> > will use command-line arguments only
> > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
> > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
> > --input=[asf-output/classification/sgd/splits/mapRedOut/],
> > --output=[asf-output/classification/sgd/models], --poolSize=[5],
> > --startPhase=[0], --tempDir=[temp], --threads=[20]}
> > 24168 training files
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       1
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
> >  2
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       3
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       4
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       6
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       8
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       10
> > 0.000   0.00    none
> > 0.00    0.00
> >    0.00    0.00    0.0000000       0.0000000       12
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       15
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       20
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       25
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       30
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000
> >     0.0000000       40
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       50
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       60
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       70
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       80
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       100
> > 0.000
> >  0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       120
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       140
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       150
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       200
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       250
> > 0.000   0.00    none
> > 0.00    0.00
> >  0.00    0.00    0.0000000       0.0000000       300
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       400
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       500
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       600
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       700
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000
> >  0.0000000       800
> > 0.000   0.00    none
> > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > 1.0019413e-08   1000    -0.607  75.78   none
> > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > 1.0019413e-08   1200    -0.607  75.78   none
> > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > 1.0019413e-08   1400    -0.607  75.78   none
> > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > 1.0019413e-08   1500    -0.607  75.78   none
> > 0.24    43686.00        17924.00        329.50
> >  1.0571799e-08
> > 1.0032261e-08   2000    -0.487  82.65   none
> > 0.24    49753.00        21610.00        330.71  1.3770070e-08
> > 1.0011902e-08   2500    -0.439  83.90   none
> > 0.24    49753.00        21610.00        330.71  1.3770070e-08
> > 1.0011902e-08   3000    -0.439  83.90   none
> > 0.32    50635.00        28531.00        437.09  1.0551175e-08
> > 1.0000001e-08   4000    -0.351  88.14   none
> > 0.32    50635.00        32642.00        437.09  1.0551175e-08
> > 1.0000000e-08   5000    -0.378  87.10   none
> > 0.32    50635.00        36461.00        437.09
> >  1.0556652e-08
> > 1.0000001e-08   6000    -0.372  86.89   none
> > 0.32    50635.00        37768.00        437.09  1.0576742e-08
> > 1.0000001e-08   7000    -0.334  89.26   none
> > 0.32    50635.00        38807.00        437.09  1.0576742e-08
> > 1.0000000e-08   8000    -0.368  87.52   none
> > 0.32    50635.00        44731.00        437.09  1.0576716e-08
> > 1.0000000e-08   10000   -0.374  87.39   none
> > 0.32    50635.00        45672.00        437.09  1.0576716e-08
> > 1.0000000e-08   12000   -0.298  88.26   none
> > Exception in thread "main" java.lang.IllegalStateException:
> > java.lang.ArrayIndexOutOfBoundsException:
> >  2
> >         at
> >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
> >         at
> >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
> >         at
> >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
> >         at
> > org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
> >         at
> > org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> >
> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >
> >  at
> >
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >         at java.lang.reflect.Method.invoke(Method.java:622)
> >         at
> >
> > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> >         at
> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> >         at
> > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> >
> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >         at
> >
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >         at java.lang.reflect.Method.invoke(Method.java:622)
> >         at
> >  org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
> >         at
> > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
> >         at
> >
> > org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
> >         at
> >
> > org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
> >         at
> >
> > org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
> >         at
> >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
> >         at
> >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
> >
> >  at
> >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
> >         at
> >
> > org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
> >         at
> >
> > org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
> >         at
> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >         at
> >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> >         at
> >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >         at java.lang.Thread.run(Thread.java:701)
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
> > andrew.musselman@gmail.com> wrote:
> >
> > > Trying out the build today
> > >
> > >
> > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <suneel_marthi@yahoo.com
> > >wrote:
> > >
> > >> This is an issue (trivial one though) that needs to be fixed for 0.9
> > >> Release, will be rerolling the release today (in the next few hrs) and
> > >> putting out a new release candidate in staging.
> > >>
> > >> Thanks for reporting this Andrew P.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
> > ap.dev@outlook.com>
> > >> wrote:
> > >>
> > >> I ran through the tests with on a CentOS VM
> >  AMD64 2 cores 4 GB RAM.  Had
> > >> a bit of trouble getting the Hadoop natives to compile and therefore may
> > >> have run into some problems because of the hadoop setup.  Ran into some
> > >> problems in the example scripts.  Particularly with
> > >> ./cluster-syntheticcontrol.sh ->4,5.  I will run through the rest of the
> > >> examples when im sure I've got hadoop setup right.
> > >>
> > >>
> > >> Apache Maven 3.1.2-SNAPSHOT
> > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> > >> Java home: /usr/java/jdk1.6.0_45/jre
> > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64",
> > >> family: "unix"
> > >> $MAHOUT_LOCAL=true
> > >> Hadoop 2.2.0
> > >>
> > >>
> > >> a) Verify that u can unpack the release (tar or zip) ...passed (tar)
> > >> [passed ]
> > >>
> > >> b) Verify u r able to compile the
> >  distro
> > >>
> > >>     mvn compile- [passed with warnings]
> > >>
> > >>     [WARNING]  Expected all dependencies to require Scala version: 2.9.3
> > >>     [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires scala
> > >> version: 2.9.3
> > >>     [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala
> > >> version: 2.9.2
> > >>     [WARNING] Multiple versions of scala libraries detected!
> > >>
> > >> c)  Run through the unit tests: mvn clean test
> > >>     mvn clean test [passed]
> > >>
> > >> d) Run the
> > >>  example scripts under $MAHOUT_HOME/examples/bin.
> > >> Please run through all the different options in each script
> > >>
> > >>     Running example scripts with $MAHOUT_LOCAL=true
> > >>
> > >>
> >  ./cluster-syntheticcontrol.sh ->1 [works]
> > >>     ./cluster-syntheticcontrol.sh ->2 [works]
> > >>     ./cluster-syntheticcontrol.sh ->3 [works]
> > >>
> > >>
> > >>     ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
> > >>     [...]
> > >>     WARNING: Unable to add class:
> > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > >>     java.lang.ClassNotFoundException:
> > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > >>         at
> > >>  java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > >>         at java.security.AccessController.doPrivileged(Native Method)
> > >>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > >>         at
> >  java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > >>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > >>         at java.lang.Class.forName0(Native Method)
> > >>         at java.lang.Class.forName(Class.java:171)
> > >>         at
> > >> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > >>         at
> > >>  org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > >>     Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
> > >>
> > >>
> > >>     ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
> > >>
> > >>     WARNING: Unable to add class:
> > >>
> >  org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > >>     java.lang.ClassNotFoundException:
> > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > >>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > >>         at java.security.AccessController.doPrivileged(Native Method)
> > >>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > >>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > >>         at java.lang.Class.forName0(Native Method)
> > >>         at
> >  java.lang.Class.forName(Class.java:171)
> > >>         at
> > >> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > >>         at
> > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > >>     Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
> > >>     WARNING: No
> > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found
> > on
> > >> classpath, will use command-line arguments only
> > >>     Unknown program
> > >>  'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.
> > >>
> > >>
> > >>     ./classify-20newsgroups.sh ->1 [works]
> > >>     ./classify-20newsgroups.sh ->2 [works]
> > >>
> > >>
> > >>     cluster-reuters.sh ->1 [works]
> > >>
> >  cluster-reuters.sh ->2 [works]
> > >>     cluster-reuters.sh ->3 [works]
> > >>
> > >>     Same error as noted previosly in the thread:
> > >>
> > >>     cluster-reuters.sh ->4 [0 clusters]
> > >>
> > >>     [...]
> > >>
> > >>     WARNING: No qualcluster.props found on classpath, will use
> > >> command-line arguments only
> > >>     Num clusters: 0; maxDistance: 0.000000
> > >>     [Dunn Index]
> > >>  First: Infinity
> > >>     [Davies-Bouldin Index] First: NaN
> > >>     Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>     INFO: Program took 669 ms (Minutes: 0.01115)
> > >>     cluster,distance.mean,distance.sd
> > >>
> >
> >  ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
> > >> > From: suneel_marthi@yahoo.com
> > >> > Subject: MAHOUT 0.9 Release - New URL
> > >> > To: user@mahout.apache.org; dev@mahout.apache.org
> > >> >
> > >> > Third time's a Charm!!!
> > >> >
> > >> >
> > >> > Here's the new URL for Mahout 0.9 Release:
> > >> >
> > >>
> > https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> > >> >
> > >> > For those volunteering to test this, some of the things to be
> > verified:
> > >> >
> > >> > a) Verify that u can unpack the release (tar or zip)
> > >> > b) Verify u r able to compile the distro
> > >> > c)  Run through the unit tests: mvn clean test
> > >> > d) Run the example scripts
> > >>  under $MAHOUT_HOME/examples/bin. Please run through all the different
> > >> options in each script.
> > >> >
> > >> >
> > >> > Committers
> > >> >  and PMC members:
> > >> > ---------------------------------------
> > >> >
> > >> > Need 'at least 3 +1 votes' for the Release to pass.
> > >> >
> > >> >
> > >> > Thanks and
> >  Regards.
> > >>
> > >
> > >
> >

RE: MAHOUT 0.9 Release - New URL

Posted by Andrew Palumbo <ap...@outlook.com>.
from the asf-email-examples.sh script:

# You will need to download or otherwise obtain some or all of the Amazon ASF Em
ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566) to use this
 script.
# To obtain a full copy you will need to launch an EC2 instance and mount the da
taset to download it, otherwise you can get a sample of it at
# http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout

It looks like the:
 http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout

link is down.  

Is there somewhere else that we can get a subset of the ASF emails?



Date: Tue, 21 Jan 2014 09:48:06 -0800
> Subject: Re: MAHOUT 0.9 Release - New URL
> From: andrew.musselman@gmail.com
> To: dev@mahout.apache.org
> 
> Sure thing; continuing to smoke test the other examples tonight
> 
> 
> On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <su...@yahoo.com>wrote:
> 
> > Thanks Andrew M., see that some of the example scripts need to be fixed as
> > they still refer to the deprecated algorithms.
> > See that the Streaming KMeans has failed for you as well.
> >
> > I'll be rolling back the release today to fix these issues.
> >
> >
> >
> >
> >
> > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
> > andrew.musselman@gmail.com> wrote:
> >
> > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default 64-bit
> > Linux AMI from tarball.
> >
> > All tests pass.
> >
> > *Output of examples:*
> > *asf-email-examples.sh, run on mahout.apache.org
> > <http://mahout.apache.org>:*
> > *recommendations:*
> > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
> > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000  | less
> > 1
> >
> > [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
> > 4
> >
> > [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
> > 6
> >
> > [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
> > 8
> >     [12758:1.0,19409:1.0,11112:1.0]
> > 11
> >
> > [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
> > 14
> >
> > [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
> > 15
> >
> > [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
> > 16
> >
> > [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
> > 18
> >
> > [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
> > 19      [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
> > 20
> >
> > [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
> > [snip]
> >
> > *clustering; kmeans:*
> > [snip]
> >         Weight : [props - optional]:  Point:
> >         1.0 :
> >  [distance-squared=1.0193102046188427]:
> > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus =
> > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110, 7573:0.204,
> > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093, 9779:0.159,
> > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
> > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
> > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
> > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
> > 39789:0.110, 40743:0.190, 45775:0.086]
> >         1.0 : [distance-squared=0.9823018320457279]:
> > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus =
> > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104, 5336:0.106,
> > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173, 7832:0.072,
> > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
> > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
> > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
> > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
> > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
> >         1.0 : [distance-squared=0.9509142993214911]:
> > /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
> > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
> >  4419:0.076,
> > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056, 7235:0.048,
> > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123, 7683:0.077,
> > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118, 10225:0.081,
> > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
> > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
> > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
> > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
> > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
> > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
> > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
> > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
> > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
> > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
> > 41280:0.065, 41696:0.072, 41947:0.118,
> >  43685:0.086, 44077:0.308,
> > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
> > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
> > [snip]
> >
> > *clustering; dirichlet:*
> > Get this complaint:
> > Running Dirichlet with K = 8
> > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > HADOOP_CONF_DIR=
> > MAHOUT-JOB:
> >
> > /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class: dirichlet
> > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props found on
> > classpath, will use command-line arguments only
> > Unknown program 'dirichlet' chosen.
> >
> > *clustering: minhash:*
> > Running Minhash
> > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > HADOOP_CONF_DIR=
> > MAHOUT-JOB:
> >
> > /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > 14/01/21 05:17:27 WARN
> >  driver.MahoutDriver: Unable to add class: minhash
> > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found on
> > classpath, will use command-line arguments only
> > Unknown program 'minhash' chosen.
> >
> > *classification; standard:*
> > =======================================================
> > Summary
> > -------------------------------------------------------
> > Correctly Classified Instances          :       5384       87.7874%
> > Incorrectly Classified Instances        :        749       12.2126%
> > Total Classified Instances              :       6133
> >
> > =======================================================
> > Confusion Matrix
> > -------------------------------------------------------
> > a       b       c       d
> >     <--Classified as
> > 2949    7       531     25       |  3512        a     = dev
> > 0       0       0       0        |  0           b     = general
> > 99      8       1763    8        |  1878        c     = user
> > 41      1       29      672      |  743         d     = commits
> >
> > =======================================================
> > Statistics
> > -------------------------------------------------------
> > Kappa
> >  0.7877
> > Accuracy                                   87.7874%
> > Reliability                                 53.658%
> > Reliability (standard deviation)            0.4911
> >
> > *classification; complementary:*
> > =======================================================
> > Summary
> > -------------------------------------------------------
> > Correctly Classified Instances          :       5530       90.1679%
> > Incorrectly Classified Instances        :        603        9.8321%
> > Total Classified Instances              :
> >  6133
> >
> > =======================================================
> > Confusion Matrix
> > -------------------------------------------------------
> > a       b       c       d       <--Classified as
> > 3168    0       276     68       |  3512        a     = dev
> > 0       0       0       0        |  0           b     = general
> > 196     0       1652    30       |  1878        c     = user
> > 25      0       8       710      |  743         d     =
> >  commits
> >
> > =======================================================
> > Statistics
> > -------------------------------------------------------
> > Kappa                                       0.8259
> > Accuracy                                   90.1679%
> > Reliability                                54.7459%
> > Reliability (standard deviation)            0.5005
> >
> > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms (Minutes:
> > 0.34836666666666666)
> >
> > *classification; sgd, with three categories:*
> > Running SGD Training
> > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
> >  and
> > HADOOP_CONF_DIR=
> > MAHOUT-JOB:
> >
> > /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
> > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on classpath,
> > will use command-line arguments only
> > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
> > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
> > --input=[asf-output/classification/sgd/splits/mapRedOut/],
> > --output=[asf-output/classification/sgd/models], --poolSize=[5],
> > --startPhase=[0], --tempDir=[temp], --threads=[20]}
> > 24168 training files
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       1
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000
> >  2
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       3
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       4
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       6
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       8
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       10
> > 0.000   0.00    none
> > 0.00    0.00
> >    0.00    0.00    0.0000000       0.0000000       12
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       15
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       20
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       25
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       30
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000
> >     0.0000000       40
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       50
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       60
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       70
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       80
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       100
> > 0.000
> >  0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       120
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       140
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       150
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       200
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       250
> > 0.000   0.00    none
> > 0.00    0.00
> >  0.00    0.00    0.0000000       0.0000000       300
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       400
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       500
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       600
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000       0.0000000       700
> > 0.000   0.00    none
> > 0.00    0.00    0.00    0.00    0.0000000
> >  0.0000000       800
> > 0.000   0.00    none
> > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > 1.0019413e-08   1000    -0.607  75.78   none
> > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > 1.0019413e-08   1200    -0.607  75.78   none
> > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > 1.0019413e-08   1400    -0.607  75.78   none
> > 0.13    32659.00        12672.00        82.50   1.3512194e-08
> > 1.0019413e-08   1500    -0.607  75.78   none
> > 0.24    43686.00        17924.00        329.50
> >  1.0571799e-08
> > 1.0032261e-08   2000    -0.487  82.65   none
> > 0.24    49753.00        21610.00        330.71  1.3770070e-08
> > 1.0011902e-08   2500    -0.439  83.90   none
> > 0.24    49753.00        21610.00        330.71  1.3770070e-08
> > 1.0011902e-08   3000    -0.439  83.90   none
> > 0.32    50635.00        28531.00        437.09  1.0551175e-08
> > 1.0000001e-08   4000    -0.351  88.14   none
> > 0.32    50635.00        32642.00        437.09  1.0551175e-08
> > 1.0000000e-08   5000    -0.378  87.10   none
> > 0.32    50635.00        36461.00        437.09
> >  1.0556652e-08
> > 1.0000001e-08   6000    -0.372  86.89   none
> > 0.32    50635.00        37768.00        437.09  1.0576742e-08
> > 1.0000001e-08   7000    -0.334  89.26   none
> > 0.32    50635.00        38807.00        437.09  1.0576742e-08
> > 1.0000000e-08   8000    -0.368  87.52   none
> > 0.32    50635.00        44731.00        437.09  1.0576716e-08
> > 1.0000000e-08   10000   -0.374  87.39   none
> > 0.32    50635.00        45672.00        437.09  1.0576716e-08
> > 1.0000000e-08   12000   -0.298  88.26   none
> > Exception in thread "main" java.lang.IllegalStateException:
> > java.lang.ArrayIndexOutOfBoundsException:
> >  2
> >         at
> >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
> >         at
> >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
> >         at
> >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
> >         at
> > org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
> >         at
> > org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> >
> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >
> >  at
> >
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >         at java.lang.reflect.Method.invoke(Method.java:622)
> >         at
> >
> > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> >         at
> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> >         at
> > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> >
> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >         at
> >
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >         at java.lang.reflect.Method.invoke(Method.java:622)
> >         at
> >  org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
> >         at
> > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
> >         at
> >
> > org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
> >         at
> >
> > org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
> >         at
> >
> > org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
> >         at
> >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
> >         at
> >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
> >
> >  at
> >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
> >         at
> >
> > org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
> >         at
> >
> > org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
> >         at
> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >         at
> >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> >         at
> >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >         at java.lang.Thread.run(Thread.java:701)
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
> > andrew.musselman@gmail.com> wrote:
> >
> > > Trying out the build today
> > >
> > >
> > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <suneel_marthi@yahoo.com
> > >wrote:
> > >
> > >> This is an issue (trivial one though) that needs to be fixed for 0.9
> > >> Release, will be rerolling the release today (in the next few hrs) and
> > >> putting out a new release candidate in staging.
> > >>
> > >> Thanks for reporting this Andrew P.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
> > ap.dev@outlook.com>
> > >> wrote:
> > >>
> > >> I ran through the tests with on a CentOS VM
> >  AMD64 2 cores 4 GB RAM.  Had
> > >> a bit of trouble getting the Hadoop natives to compile and therefore may
> > >> have run into some problems because of the hadoop setup.  Ran into some
> > >> problems in the example scripts.  Particularly with
> > >> ./cluster-syntheticcontrol.sh ->4,5.  I will run through the rest of the
> > >> examples when im sure I've got hadoop setup right.
> > >>
> > >>
> > >> Apache Maven 3.1.2-SNAPSHOT
> > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> > >> Java home: /usr/java/jdk1.6.0_45/jre
> > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64",
> > >> family: "unix"
> > >> $MAHOUT_LOCAL=true
> > >> Hadoop 2.2.0
> > >>
> > >>
> > >> a) Verify that u can unpack the release (tar or zip) ...passed (tar)
> > >> [passed ]
> > >>
> > >> b) Verify u r able to compile the
> >  distro
> > >>
> > >>     mvn compile- [passed with warnings]
> > >>
> > >>     [WARNING]  Expected all dependencies to require Scala version: 2.9.3
> > >>     [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires scala
> > >> version: 2.9.3
> > >>     [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala
> > >> version: 2.9.2
> > >>     [WARNING] Multiple versions of scala libraries detected!
> > >>
> > >> c)  Run through the unit tests: mvn clean test
> > >>     mvn clean test [passed]
> > >>
> > >> d) Run the
> > >>  example scripts under $MAHOUT_HOME/examples/bin.
> > >> Please run through all the different options in each script
> > >>
> > >>     Running example scripts with $MAHOUT_LOCAL=true
> > >>
> > >>
> >  ./cluster-syntheticcontrol.sh ->1 [works]
> > >>     ./cluster-syntheticcontrol.sh ->2 [works]
> > >>     ./cluster-syntheticcontrol.sh ->3 [works]
> > >>
> > >>
> > >>     ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
> > >>     [...]
> > >>     WARNING: Unable to add class:
> > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > >>     java.lang.ClassNotFoundException:
> > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > >>         at
> > >>  java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > >>         at java.security.AccessController.doPrivileged(Native Method)
> > >>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > >>         at
> >  java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > >>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > >>         at java.lang.Class.forName0(Native Method)
> > >>         at java.lang.Class.forName(Class.java:171)
> > >>         at
> > >> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > >>         at
> > >>  org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > >>     Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
> > >>
> > >>
> > >>     ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
> > >>
> > >>     WARNING: Unable to add class:
> > >>
> >  org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > >>     java.lang.ClassNotFoundException:
> > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > >>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > >>         at java.security.AccessController.doPrivileged(Native Method)
> > >>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > >>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > >>         at java.lang.Class.forName0(Native Method)
> > >>         at
> >  java.lang.Class.forName(Class.java:171)
> > >>         at
> > >> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > >>         at
> > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > >>     Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
> > >>     WARNING: No
> > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found
> > on
> > >> classpath, will use command-line arguments only
> > >>     Unknown program
> > >>  'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.
> > >>
> > >>
> > >>     ./classify-20newsgroups.sh ->1 [works]
> > >>     ./classify-20newsgroups.sh ->2 [works]
> > >>
> > >>
> > >>     cluster-reuters.sh ->1 [works]
> > >>
> >  cluster-reuters.sh ->2 [works]
> > >>     cluster-reuters.sh ->3 [works]
> > >>
> > >>     Same error as noted previosly in the thread:
> > >>
> > >>     cluster-reuters.sh ->4 [0 clusters]
> > >>
> > >>     [...]
> > >>
> > >>     WARNING: No qualcluster.props found on classpath, will use
> > >> command-line arguments only
> > >>     Num clusters: 0; maxDistance: 0.000000
> > >>     [Dunn Index]
> > >>  First: Infinity
> > >>     [Davies-Bouldin Index] First: NaN
> > >>     Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>     INFO: Program took 669 ms (Minutes: 0.01115)
> > >>     cluster,distance.mean,distance.sd
> > >>
> >
> >  ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
> > >> > From: suneel_marthi@yahoo.com
> > >> > Subject: MAHOUT 0.9 Release - New URL
> > >> > To: user@mahout.apache.org; dev@mahout.apache.org
> > >> >
> > >> > Third time's a Charm!!!
> > >> >
> > >> >
> > >> > Here's the new URL for Mahout 0.9 Release:
> > >> >
> > >>
> > https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> > >> >
> > >> > For those volunteering to test this, some of the things to be
> > verified:
> > >> >
> > >> > a) Verify that u can unpack the release (tar or zip)
> > >> > b) Verify u r able to compile the distro
> > >> > c)  Run through the unit tests: mvn clean test
> > >> > d) Run the example scripts
> > >>  under $MAHOUT_HOME/examples/bin. Please run through all the different
> > >> options in each script.
> > >> >
> > >> >
> > >> > Committers
> > >> >  and PMC members:
> > >> > ---------------------------------------
> > >> >
> > >> > Need 'at least 3 +1 votes' for the Release to pass.
> > >> >
> > >> >
> > >> > Thanks and
> >  Regards.
> > >>
> > >
> > >
> >
 		 	   		  

Re: MAHOUT 0.9 Release - New URL

Posted by Andrew Musselman <an...@gmail.com>.
Sure thing; continuing to smoke test the other examples tonight


On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <su...@yahoo.com>wrote:

> Thanks Andrew M., see that some of the example scripts need to be fixed as
> they still refer to the deprecated algorithms.
> See that the Streaming KMeans has failed for you as well.
>
> I'll be rolling back the release today to fix these issues.
>
>
>
>
>
> On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
> Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default 64-bit
> Linux AMI from tarball.
>
> All tests pass.
>
> *Output of examples:*
> *asf-email-examples.sh, run on mahout.apache.org
> <http://mahout.apache.org>:*
> *recommendations:*
> [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
> /user/ec2-user/asf-output/prefs/recommendations/part-r-00000  | less
> 1
>
> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
> 4
>
> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
> 6
>
> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
> 8
>     [12758:1.0,19409:1.0,11112:1.0]
> 11
>
> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
> 14
>
> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
> 15
>
> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
> 16
>
> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
> 18
>
> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
> 19      [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
> 20
>
> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
> [snip]
>
> *clustering; kmeans:*
> [snip]
>         Weight : [props - optional]:  Point:
>         1.0 :
>  [distance-squared=1.0193102046188427]:
> /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus =
> [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110, 7573:0.204,
> 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093, 9779:0.159,
> 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
> 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
> 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
> 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
> 39789:0.110, 40743:0.190, 45775:0.086]
>         1.0 : [distance-squared=0.9823018320457279]:
> /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus =
> [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104, 5336:0.106,
> 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173, 7832:0.072,
> 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
> 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
> 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
> 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
> 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
>         1.0 : [distance-squared=0.9509142993214911]:
> /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
> [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
>  4419:0.076,
> 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056, 7235:0.048,
> 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123, 7683:0.077,
> 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118, 10225:0.081,
> 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
> 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
> 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
> 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
> 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
> 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
> 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
> 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
> 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
> 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
> 41280:0.065, 41696:0.072, 41947:0.118,
>  43685:0.086, 44077:0.308,
> 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
> 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
> [snip]
>
> *clustering; dirichlet:*
> Get this complaint:
> Running Dirichlet with K = 8
> Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
>
> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class: dirichlet
> 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props found on
> classpath, will use command-line arguments only
> Unknown program 'dirichlet' chosen.
>
> *clustering: minhash:*
> Running Minhash
> Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
>
> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/01/21 05:17:27 WARN
>  driver.MahoutDriver: Unable to add class: minhash
> 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found on
> classpath, will use command-line arguments only
> Unknown program 'minhash' chosen.
>
> *classification; standard:*
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances          :       5384       87.7874%
> Incorrectly Classified Instances        :        749       12.2126%
> Total Classified Instances              :       6133
>
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       b       c       d
>     <--Classified as
> 2949    7       531     25       |  3512        a     = dev
> 0       0       0       0        |  0           b     = general
> 99      8       1763    8        |  1878        c     = user
> 41      1       29      672      |  743         d     = commits
>
> =======================================================
> Statistics
> -------------------------------------------------------
> Kappa
>  0.7877
> Accuracy                                   87.7874%
> Reliability                                 53.658%
> Reliability (standard deviation)            0.4911
>
> *classification; complementary:*
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances          :       5530       90.1679%
> Incorrectly Classified Instances        :        603        9.8321%
> Total Classified Instances              :
>  6133
>
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       b       c       d       <--Classified as
> 3168    0       276     68       |  3512        a     = dev
> 0       0       0       0        |  0           b     = general
> 196     0       1652    30       |  1878        c     = user
> 25      0       8       710      |  743         d     =
>  commits
>
> =======================================================
> Statistics
> -------------------------------------------------------
> Kappa                                       0.8259
> Accuracy                                   90.1679%
> Reliability                                54.7459%
> Reliability (standard deviation)            0.5005
>
> 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms (Minutes:
> 0.34836666666666666)
>
> *classification; sgd, with three categories:*
> Running SGD Training
> Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
>  and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
>
> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/01/21 05:58:00 WARN driver.MahoutDriver: No
> org.apache.mahout.classifier.sgd.TrainASFEmail.props found on classpath,
> will use command-line arguments only
> 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
> {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
> --input=[asf-output/classification/sgd/splits/mapRedOut/],
> --output=[asf-output/classification/sgd/models], --poolSize=[5],
> --startPhase=[0], --tempDir=[temp], --threads=[20]}
> 24168 training files
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000       1
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000
>  2
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000       3
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000       4
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000       6
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000       8
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000       10
> 0.000   0.00    none
> 0.00    0.00
>    0.00    0.00    0.0000000       0.0000000       12
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000       15
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000       20
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000       25
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000       30
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000
>     0.0000000       40
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000       50
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000       60
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000       70
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000       80
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000       100
> 0.000
>  0.00    none
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000       120
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000       140
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000       150
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000       200
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000       250
> 0.000   0.00    none
> 0.00    0.00
>  0.00    0.00    0.0000000       0.0000000       300
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000       400
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000       500
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000       600
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000       0.0000000       700
> 0.000   0.00    none
> 0.00    0.00    0.00    0.00    0.0000000
>  0.0000000       800
> 0.000   0.00    none
> 0.13    32659.00        12672.00        82.50   1.3512194e-08
> 1.0019413e-08   1000    -0.607  75.78   none
> 0.13    32659.00        12672.00        82.50   1.3512194e-08
> 1.0019413e-08   1200    -0.607  75.78   none
> 0.13    32659.00        12672.00        82.50   1.3512194e-08
> 1.0019413e-08   1400    -0.607  75.78   none
> 0.13    32659.00        12672.00        82.50   1.3512194e-08
> 1.0019413e-08   1500    -0.607  75.78   none
> 0.24    43686.00        17924.00        329.50
>  1.0571799e-08
> 1.0032261e-08   2000    -0.487  82.65   none
> 0.24    49753.00        21610.00        330.71  1.3770070e-08
> 1.0011902e-08   2500    -0.439  83.90   none
> 0.24    49753.00        21610.00        330.71  1.3770070e-08
> 1.0011902e-08   3000    -0.439  83.90   none
> 0.32    50635.00        28531.00        437.09  1.0551175e-08
> 1.0000001e-08   4000    -0.351  88.14   none
> 0.32    50635.00        32642.00        437.09  1.0551175e-08
> 1.0000000e-08   5000    -0.378  87.10   none
> 0.32    50635.00        36461.00        437.09
>  1.0556652e-08
> 1.0000001e-08   6000    -0.372  86.89   none
> 0.32    50635.00        37768.00        437.09  1.0576742e-08
> 1.0000001e-08   7000    -0.334  89.26   none
> 0.32    50635.00        38807.00        437.09  1.0576742e-08
> 1.0000000e-08   8000    -0.368  87.52   none
> 0.32    50635.00        44731.00        437.09  1.0576716e-08
> 1.0000000e-08   10000   -0.374  87.39   none
> 0.32    50635.00        45672.00        437.09  1.0576716e-08
> 1.0000000e-08   12000   -0.298  88.26   none
> Exception in thread "main" java.lang.IllegalStateException:
> java.lang.ArrayIndexOutOfBoundsException:
>  2
>         at
>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
>         at
>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
>         at
>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
>         at
> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
>         at
> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
>  at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:622)
>         at
>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>         at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>         at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:622)
>         at
>  org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
>         at
> org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
>         at
>
> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
>         at
>
> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
>         at
>
> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
>         at
>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
>         at
>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
>
>  at
>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
>         at
>
> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
>         at
>
> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:701)
>
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
> > Trying out the build today
> >
> >
> > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <suneel_marthi@yahoo.com
> >wrote:
> >
> >> This is an issue (trivial one though) that needs to be fixed for 0.9
> >> Release, will be rerolling the release today (in the next few hrs) and
> >> putting out a new release candidate in staging.
> >>
> >> Thanks for reporting this Andrew P.
> >>
> >>
> >>
> >>
> >>
> >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
> ap.dev@outlook.com>
> >> wrote:
> >>
> >> I ran through the tests with on a CentOS VM
>  AMD64 2 cores 4 GB RAM.  Had
> >> a bit of trouble getting the Hadoop natives to compile and therefore may
> >> have run into some problems because of the hadoop setup.  Ran into some
> >> problems in the example scripts.  Particularly with
> >> ./cluster-syntheticcontrol.sh ->4,5.  I will run through the rest of the
> >> examples when im sure I've got hadoop setup right.
> >>
> >>
> >> Apache Maven 3.1.2-SNAPSHOT
> >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> >> Java home: /usr/java/jdk1.6.0_45/jre
> >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64",
> >> family: "unix"
> >> $MAHOUT_LOCAL=true
> >> Hadoop 2.2.0
> >>
> >>
> >> a) Verify that u can unpack the release (tar or zip) ...passed (tar)
> >> [passed ]
> >>
> >> b) Verify u r able to compile the
>  distro
> >>
> >>     mvn compile- [passed with warnings]
> >>
> >>     [WARNING]  Expected all dependencies to require Scala version: 2.9.3
> >>     [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires scala
> >> version: 2.9.3
> >>     [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala
> >> version: 2.9.2
> >>     [WARNING] Multiple versions of scala libraries detected!
> >>
> >> c)  Run through the unit tests: mvn clean test
> >>     mvn clean test [passed]
> >>
> >> d) Run the
> >>  example scripts under $MAHOUT_HOME/examples/bin.
> >> Please run through all the different options in each script
> >>
> >>     Running example scripts with $MAHOUT_LOCAL=true
> >>
> >>
>  ./cluster-syntheticcontrol.sh ->1 [works]
> >>     ./cluster-syntheticcontrol.sh ->2 [works]
> >>     ./cluster-syntheticcontrol.sh ->3 [works]
> >>
> >>
> >>     ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
> >>     [...]
> >>     WARNING: Unable to add class:
> >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> >>     java.lang.ClassNotFoundException:
> >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> >>         at
> >>  java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> >>         at java.security.AccessController.doPrivileged(Native Method)
> >>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> >>         at
>  java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> >>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> >>         at java.lang.Class.forName0(Native Method)
> >>         at java.lang.Class.forName(Class.java:171)
> >>         at
> >> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> >>         at
> >>  org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> >>     Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
> >>
> >>
> >>     ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
> >>
> >>     WARNING: Unable to add class:
> >>
>  org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> >>     java.lang.ClassNotFoundException:
> >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> >>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> >>         at java.security.AccessController.doPrivileged(Native Method)
> >>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> >>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> >>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> >>         at java.lang.Class.forName0(Native Method)
> >>         at
>  java.lang.Class.forName(Class.java:171)
> >>         at
> >> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> >>         at
> >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> >>     Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
> >>     WARNING: No
> >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found
> on
> >> classpath, will use command-line arguments only
> >>     Unknown program
> >>  'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.
> >>
> >>
> >>     ./classify-20newsgroups.sh ->1 [works]
> >>     ./classify-20newsgroups.sh ->2 [works]
> >>
> >>
> >>     cluster-reuters.sh ->1 [works]
> >>
>  cluster-reuters.sh ->2 [works]
> >>     cluster-reuters.sh ->3 [works]
> >>
> >>     Same error as noted previosly in the thread:
> >>
> >>     cluster-reuters.sh ->4 [0 clusters]
> >>
> >>     [...]
> >>
> >>     WARNING: No qualcluster.props found on classpath, will use
> >> command-line arguments only
> >>     Num clusters: 0; maxDistance: 0.000000
> >>     [Dunn Index]
> >>  First: Infinity
> >>     [Davies-Bouldin Index] First: NaN
> >>     Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
> >>     INFO: Program took 669 ms (Minutes: 0.01115)
> >>     cluster,distance.mean,distance.sd
> >>
>
>  ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> >>
> >>
> >>
> >>
> >>
> >>
> >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
> >> > From: suneel_marthi@yahoo.com
> >> > Subject: MAHOUT 0.9 Release - New URL
> >> > To: user@mahout.apache.org; dev@mahout.apache.org
> >> >
> >> > Third time's a Charm!!!
> >> >
> >> >
> >> > Here's the new URL for Mahout 0.9 Release:
> >> >
> >>
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> >> >
> >> > For those volunteering to test this, some of the things to be
> verified:
> >> >
> >> > a) Verify that u can unpack the release (tar or zip)
> >> > b) Verify u r able to compile the distro
> >> > c)  Run through the unit tests: mvn clean test
> >> > d) Run the example scripts
> >>  under $MAHOUT_HOME/examples/bin. Please run through all the different
> >> options in each script.
> >> >
> >> >
> >> > Committers
> >> >  and PMC members:
> >> > ---------------------------------------
> >> >
> >> > Need 'at least 3 +1 votes' for the Release to pass.
> >> >
> >> >
> >> > Thanks and
>  Regards.
> >>
> >
> >
>

Re: MAHOUT 0.9 Release - New URL

Posted by Suneel Marthi <su...@yahoo.com>.
Thanks Andrew M., see that some of the example scripts need to be fixed as they still refer to the deprecated algorithms.
See that the Streaming KMeans has failed for you as well.

I'll be rolling back the release today to fix these issues.  





On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <an...@gmail.com> wrote:
 
Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default 64-bit
Linux AMI from tarball.

All tests pass.

*Output of examples:*
*asf-email-examples.sh, run on mahout.apache.org
<http://mahout.apache.org>:*
*recommendations:*
[ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
/user/ec2-user/asf-output/prefs/recommendations/part-r-00000  | less
1
[21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
4
[14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
6
[5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
8   
    [12758:1.0,19409:1.0,11112:1.0]
11
[25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
14
[29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
15
[15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
16
[23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
18
[29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
19      [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
20
[19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
[snip]

*clustering; kmeans:*
[snip]
        Weight : [props - optional]:  Point:
        1.0 :
 [distance-squared=1.0193102046188427]:
/commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus =
[1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110, 7573:0.204,
7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093, 9779:0.159,
10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
39789:0.110, 40743:0.190, 45775:0.086]
        1.0 : [distance-squared=0.9823018320457279]:
/commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus =
[1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104, 5336:0.106,
6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173, 7832:0.072,
10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
        1.0 : [distance-squared=0.9509142993214911]:
/commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
[648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
 4419:0.076,
4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056, 7235:0.048,
7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123, 7683:0.077,
7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118, 10225:0.081,
10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
41280:0.065, 41696:0.072, 41947:0.118,
 43685:0.086, 44077:0.308,
44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
[snip]

*clustering; dirichlet:*
Get this complaint:
Running Dirichlet with K = 8
Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class: dirichlet
14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props found on
classpath, will use command-line arguments only
Unknown program 'dirichlet' chosen.

*clustering: minhash:*
Running Minhash
Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/01/21 05:17:27 WARN
 driver.MahoutDriver: Unable to add class: minhash
14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found on
classpath, will use command-line arguments only
Unknown program 'minhash' chosen.

*classification; standard:*
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :       5384       87.7874%
Incorrectly Classified Instances        :        749       12.2126%
Total Classified Instances              :       6133

=======================================================
Confusion Matrix
-------------------------------------------------------
a       b       c       d   
    <--Classified as
2949    7       531     25       |  3512        a     = dev
0       0       0       0        |  0           b     = general
99      8       1763    8        |  1878        c     = user
41      1       29      672      |  743         d     = commits

=======================================================
Statistics
-------------------------------------------------------
Kappa                                      
 0.7877
Accuracy                                   87.7874%
Reliability                                 53.658%
Reliability (standard deviation)            0.4911

*classification; complementary:*
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :       5530       90.1679%
Incorrectly Classified Instances        :        603        9.8321%
Total Classified Instances              :      
 6133

=======================================================
Confusion Matrix
-------------------------------------------------------
a       b       c       d       <--Classified as
3168    0       276     68       |  3512        a     = dev
0       0       0       0        |  0           b     = general
196     0       1652    30       |  1878        c     = user
25      0       8       710      |  743         d     =
 commits

=======================================================
Statistics
-------------------------------------------------------
Kappa                                       0.8259
Accuracy                                   90.1679%
Reliability                                54.7459%
Reliability (standard deviation)            0.5005

14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms (Minutes:
0.34836666666666666)

*classification; sgd, with three categories:*
Running SGD Training
Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
 and
HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/01/21 05:58:00 WARN driver.MahoutDriver: No
org.apache.mahout.classifier.sgd.TrainASFEmail.props found on classpath,
will use command-line arguments only
14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
{--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
--input=[asf-output/classification/sgd/splits/mapRedOut/],
--output=[asf-output/classification/sgd/models], --poolSize=[5],
--startPhase=[0], --tempDir=[temp], --threads=[20]}
24168 training files
0.00    0.00    0.00    0.00    0.0000000       0.0000000       1
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000      
 2
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       3
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       4
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       6
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       8
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       10
0.000   0.00    none
0.00    0.00 
   0.00    0.00    0.0000000       0.0000000       12
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       15
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       20
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       25
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       30
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000   
    0.0000000       40
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       50
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       60
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       70
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       80
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       100
0.000  
 0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       120
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       140
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       150
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       200
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       250
0.000   0.00    none
0.00    0.00   
 0.00    0.00    0.0000000       0.0000000       300
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       400
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       500
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       600
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       700
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000      
 0.0000000       800
0.000   0.00    none
0.13    32659.00        12672.00        82.50   1.3512194e-08
1.0019413e-08   1000    -0.607  75.78   none
0.13    32659.00        12672.00        82.50   1.3512194e-08
1.0019413e-08   1200    -0.607  75.78   none
0.13    32659.00        12672.00        82.50   1.3512194e-08
1.0019413e-08   1400    -0.607  75.78   none
0.13    32659.00        12672.00        82.50   1.3512194e-08
1.0019413e-08   1500    -0.607  75.78   none
0.24    43686.00        17924.00        329.50 
 1.0571799e-08
1.0032261e-08   2000    -0.487  82.65   none
0.24    49753.00        21610.00        330.71  1.3770070e-08
1.0011902e-08   2500    -0.439  83.90   none
0.24    49753.00        21610.00        330.71  1.3770070e-08
1.0011902e-08   3000    -0.439  83.90   none
0.32    50635.00        28531.00        437.09  1.0551175e-08
1.0000001e-08   4000    -0.351  88.14   none
0.32    50635.00        32642.00        437.09  1.0551175e-08
1.0000000e-08   5000    -0.378  87.10   none
0.32    50635.00        36461.00        437.09 
 1.0556652e-08
1.0000001e-08   6000    -0.372  86.89   none
0.32    50635.00        37768.00        437.09  1.0576742e-08
1.0000001e-08   7000    -0.334  89.26   none
0.32    50635.00        38807.00        437.09  1.0576742e-08
1.0000000e-08   8000    -0.368  87.52   none
0.32    50635.00        44731.00        437.09  1.0576716e-08
1.0000000e-08   10000   -0.374  87.39   none
0.32    50635.00        45672.00        437.09  1.0576716e-08
1.0000000e-08   12000   -0.298  88.26   none
Exception in thread "main" java.lang.IllegalStateException:
java.lang.ArrayIndexOutOfBoundsException:
 2
        at
org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
        at
org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
        at
org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
        at
org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
        at
org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
       
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:622)
        at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:622)
        at
 org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
        at org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
        at
org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
        at
org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
        at
org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
        at
org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
        at
org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
       
 at
org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
        at
org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
        at
org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:701)












On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:

> Trying out the build today
>
>
> On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <su...@yahoo.com>wrote:
>
>> This is an issue (trivial one though) that needs to be fixed for 0.9
>> Release, will be rerolling the release today (in the next few hrs) and
>> putting out a new release candidate in staging.
>>
>> Thanks for reporting this Andrew P.
>>
>>
>>
>>
>>
>> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <ap...@outlook.com>
>> wrote:
>>
>> I ran through the tests with on a CentOS VM
 AMD64 2 cores 4 GB RAM.  Had
>> a bit of trouble getting the Hadoop natives to compile and therefore may
>> have run into some problems because of the hadoop setup.  Ran into some
>> problems in the example scripts.  Particularly with
>> ./cluster-syntheticcontrol.sh ->4,5.  I will run through the rest of the
>> examples when im sure I've got hadoop setup right.
>>
>>
>> Apache Maven 3.1.2-SNAPSHOT
>> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
>> Java home: /usr/java/jdk1.6.0_45/jre
>> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64",
>> family: "unix"
>> $MAHOUT_LOCAL=true
>> Hadoop 2.2.0
>>
>>
>> a) Verify that u can unpack the release (tar or zip) ...passed (tar)
>> [passed ]
>>
>> b) Verify u r able to compile the
 distro
>>
>>     mvn compile- [passed with warnings]
>>
>>     [WARNING]  Expected all dependencies to require Scala version: 2.9.3
>>     [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires scala
>> version: 2.9.3
>>     [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala
>> version: 2.9.2
>>     [WARNING] Multiple versions of scala libraries detected!
>>
>> c)  Run through the unit tests: mvn clean test
>>     mvn clean test [passed]
>>
>> d) Run the
>>  example scripts under $MAHOUT_HOME/examples/bin.
>> Please run through all the different options in each script
>>
>>     Running example scripts with $MAHOUT_LOCAL=true
>>
>>    
 ./cluster-syntheticcontrol.sh ->1 [works]
>>     ./cluster-syntheticcontrol.sh ->2 [works]
>>     ./cluster-syntheticcontrol.sh ->3 [works]
>>
>>
>>     ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
>>     [...]
>>     WARNING: Unable to add class:
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>>     java.lang.ClassNotFoundException:
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>>         at
>>  java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>         at
 java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>         at java.lang.Class.forName0(Native Method)
>>         at java.lang.Class.forName(Class.java:171)
>>         at
>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>>         at
>>  org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>>     Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
>>
>>
>>     ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
>>
>>     WARNING: Unable to add class:
>>
 org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>>     java.lang.ClassNotFoundException:
>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>         at java.lang.Class.forName0(Native Method)
>>         at
 java.lang.Class.forName(Class.java:171)
>>         at
>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>>         at
>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>>     Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
>>     WARNING: No
>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found on
>> classpath, will use command-line arguments only
>>     Unknown program
>>  'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.
>>
>>
>>     ./classify-20newsgroups.sh ->1 [works]
>>     ./classify-20newsgroups.sh ->2 [works]
>>
>>
>>     cluster-reuters.sh ->1 [works]
>>    
 cluster-reuters.sh ->2 [works]
>>     cluster-reuters.sh ->3 [works]
>>
>>     Same error as noted previosly in the thread:
>>
>>     cluster-reuters.sh ->4 [0 clusters]
>>
>>     [...]
>>
>>     WARNING: No qualcluster.props found on classpath, will use
>> command-line arguments only
>>     Num clusters: 0; maxDistance: 0.000000
>>     [Dunn Index]
>>  First: Infinity
>>     [Davies-Bouldin Index] First: NaN
>>     Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
>>     INFO: Program took 669 ms (Minutes: 0.01115)
>>     cluster,distance.mean,distance.sd
>>
 ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>>
>>
>>
>>
>>
>>
>> > Date: Thu, 16 Jan 2014 06:41:09 -0800
>> > From: suneel_marthi@yahoo.com
>> > Subject: MAHOUT 0.9 Release - New URL
>> > To: user@mahout.apache.org; dev@mahout.apache.org
>> >
>> > Third time's a Charm!!!
>> >
>> >
>> > Here's the new URL for Mahout 0.9 Release:
>> >
>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>> >
>> > For those volunteering to test this, some of the things to be verified:
>> >
>> > a) Verify that u can unpack the release (tar or zip)
>> > b) Verify u r able to compile the distro
>> > c)  Run through the unit tests: mvn clean test
>> > d) Run the example scripts
>>  under $MAHOUT_HOME/examples/bin. Please run through all the different
>> options in each script.
>> >
>> >
>> > Committers
>> >  and PMC members:
>> > ---------------------------------------
>> >
>> > Need 'at least 3 +1 votes' for the Release to pass.
>> >
>> >
>> > Thanks and
 Regards.
>>
>
>

Re: MAHOUT 0.9 Release - New URL

Posted by Andrew Musselman <an...@gmail.com>.
Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default 64-bit
Linux AMI from tarball.

All tests pass.

*Output of examples:*
*asf-email-examples.sh, run on mahout.apache.org
<http://mahout.apache.org>:*
*recommendations:*
[ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
/user/ec2-user/asf-output/prefs/recommendations/part-r-00000  | less
1
[21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
4
[14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
6
[5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
8       [12758:1.0,19409:1.0,11112:1.0]
11
 [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
14
 [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
15
 [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
16
 [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
18
 [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
19      [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
20
 [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
[snip]

*clustering; kmeans:*
[snip]
        Weight : [props - optional]:  Point:
        1.0 : [distance-squared=1.0193102046188427]:
/commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus =
[1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110, 7573:0.204,
7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093, 9779:0.159,
10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
39789:0.110, 40743:0.190, 45775:0.086]
        1.0 : [distance-squared=0.9823018320457279]:
/commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus =
[1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104, 5336:0.106,
6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173, 7832:0.072,
10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
        1.0 : [distance-squared=0.9509142993214911]:
/commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
[648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048, 4419:0.076,
4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056, 7235:0.048,
7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123, 7683:0.077,
7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118, 10225:0.081,
10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
41280:0.065, 41696:0.072, 41947:0.118, 43685:0.086, 44077:0.308,
44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
[snip]

*clustering; dirichlet:*
Get this complaint:
Running Dirichlet with K = 8
Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class: dirichlet
14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props found on
classpath, will use command-line arguments only
Unknown program 'dirichlet' chosen.

*clustering: minhash:*
Running Minhash
Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/01/21 05:17:27 WARN driver.MahoutDriver: Unable to add class: minhash
14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found on
classpath, will use command-line arguments only
Unknown program 'minhash' chosen.

*classification; standard:*
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :       5384       87.7874%
Incorrectly Classified Instances        :        749       12.2126%
Total Classified Instances              :       6133

=======================================================
Confusion Matrix
-------------------------------------------------------
a       b       c       d       <--Classified as
2949    7       531     25       |  3512        a     = dev
0       0       0       0        |  0           b     = general
99      8       1763    8        |  1878        c     = user
41      1       29      672      |  743         d     = commits

=======================================================
Statistics
-------------------------------------------------------
Kappa                                       0.7877
Accuracy                                   87.7874%
Reliability                                 53.658%
Reliability (standard deviation)            0.4911

*classification; complementary:*
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :       5530       90.1679%
Incorrectly Classified Instances        :        603        9.8321%
Total Classified Instances              :       6133

=======================================================
Confusion Matrix
-------------------------------------------------------
a       b       c       d       <--Classified as
3168    0       276     68       |  3512        a     = dev
0       0       0       0        |  0           b     = general
196     0       1652    30       |  1878        c     = user
25      0       8       710      |  743         d     = commits

=======================================================
Statistics
-------------------------------------------------------
Kappa                                       0.8259
Accuracy                                   90.1679%
Reliability                                54.7459%
Reliability (standard deviation)            0.5005

14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms (Minutes:
0.34836666666666666)

*classification; sgd, with three categories:*
Running SGD Training
Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/01/21 05:58:00 WARN driver.MahoutDriver: No
org.apache.mahout.classifier.sgd.TrainASFEmail.props found on classpath,
will use command-line arguments only
14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
{--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
--input=[asf-output/classification/sgd/splits/mapRedOut/],
--output=[asf-output/classification/sgd/models], --poolSize=[5],
--startPhase=[0], --tempDir=[temp], --threads=[20]}
24168 training files
0.00    0.00    0.00    0.00    0.0000000       0.0000000       1
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       2
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       3
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       4
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       6
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       8
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       10
 0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       12
 0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       15
 0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       20
 0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       25
 0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       30
 0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       40
 0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       50
 0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       60
 0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       70
 0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       80
 0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       100
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       120
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       140
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       150
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       200
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       250
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       300
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       400
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       500
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       600
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       700
0.000   0.00    none
0.00    0.00    0.00    0.00    0.0000000       0.0000000       800
0.000   0.00    none
0.13    32659.00        12672.00        82.50   1.3512194e-08
1.0019413e-08   1000    -0.607  75.78   none
0.13    32659.00        12672.00        82.50   1.3512194e-08
1.0019413e-08   1200    -0.607  75.78   none
0.13    32659.00        12672.00        82.50   1.3512194e-08
1.0019413e-08   1400    -0.607  75.78   none
0.13    32659.00        12672.00        82.50   1.3512194e-08
1.0019413e-08   1500    -0.607  75.78   none
0.24    43686.00        17924.00        329.50  1.0571799e-08
1.0032261e-08   2000    -0.487  82.65   none
0.24    49753.00        21610.00        330.71  1.3770070e-08
1.0011902e-08   2500    -0.439  83.90   none
0.24    49753.00        21610.00        330.71  1.3770070e-08
1.0011902e-08   3000    -0.439  83.90   none
0.32    50635.00        28531.00        437.09  1.0551175e-08
1.0000001e-08   4000    -0.351  88.14   none
0.32    50635.00        32642.00        437.09  1.0551175e-08
1.0000000e-08   5000    -0.378  87.10   none
0.32    50635.00        36461.00        437.09  1.0556652e-08
1.0000001e-08   6000    -0.372  86.89   none
0.32    50635.00        37768.00        437.09  1.0576742e-08
1.0000001e-08   7000    -0.334  89.26   none
0.32    50635.00        38807.00        437.09  1.0576742e-08
1.0000000e-08   8000    -0.368  87.52   none
0.32    50635.00        44731.00        437.09  1.0576716e-08
1.0000000e-08   10000   -0.374  87.39   none
0.32    50635.00        45672.00        437.09  1.0576716e-08
1.0000000e-08   12000   -0.298  88.26   none
Exception in thread "main" java.lang.IllegalStateException:
java.lang.ArrayIndexOutOfBoundsException: 2
        at
org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
        at
org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
        at
org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
        at
org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
        at
org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:622)
        at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:622)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
        at org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
        at
org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
        at
org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
        at
org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
        at
org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
        at
org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
        at
org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
        at
org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
        at
org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:701)












On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:

> Trying out the build today
>
>
> On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <su...@yahoo.com>wrote:
>
>> This is an issue (trivial one though) that needs to be fixed for 0.9
>> Release, will be rerolling the release today (in the next few hrs) and
>> putting out a new release candidate in staging.
>>
>> Thanks for reporting this Andrew P.
>>
>>
>>
>>
>>
>> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <ap...@outlook.com>
>> wrote:
>>
>> I ran through the tests with on a CentOS VM AMD64 2 cores 4 GB RAM.  Had
>> a bit of trouble getting the Hadoop natives to compile and therefore may
>> have run into some problems because of the hadoop setup.  Ran into some
>> problems in the example scripts.  Particularly with
>> ./cluster-syntheticcontrol.sh ->4,5.  I will run through the rest of the
>> examples when im sure I've got hadoop setup right.
>>
>>
>> Apache Maven 3.1.2-SNAPSHOT
>> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
>> Java home: /usr/java/jdk1.6.0_45/jre
>> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64",
>> family: "unix"
>> $MAHOUT_LOCAL=true
>> Hadoop 2.2.0
>>
>>
>> a) Verify that u can unpack the release (tar or zip) ...passed (tar)
>> [passed ]
>>
>> b) Verify u r able to compile the distro
>>
>>     mvn compile- [passed with warnings]
>>
>>     [WARNING]  Expected all dependencies to require Scala version: 2.9.3
>>     [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires scala
>> version: 2.9.3
>>     [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala
>> version: 2.9.2
>>     [WARNING] Multiple versions of scala libraries detected!
>>
>> c)  Run through the unit tests: mvn clean test
>>     mvn clean test [passed]
>>
>> d) Run the
>>  example scripts under $MAHOUT_HOME/examples/bin.
>> Please run through all the different options in each script
>>
>>     Running example scripts with $MAHOUT_LOCAL=true
>>
>>     ./cluster-syntheticcontrol.sh ->1 [works]
>>     ./cluster-syntheticcontrol.sh ->2 [works]
>>     ./cluster-syntheticcontrol.sh ->3 [works]
>>
>>
>>     ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
>>     [...]
>>     WARNING: Unable to add class:
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>>     java.lang.ClassNotFoundException:
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>>         at
>>  java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>         at java.lang.Class.forName0(Native Method)
>>         at java.lang.Class.forName(Class.java:171)
>>         at
>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>>         at
>>  org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>>     Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
>>
>>
>>     ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
>>
>>     WARNING: Unable to add class:
>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>>     java.lang.ClassNotFoundException:
>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>         at java.lang.Class.forName0(Native Method)
>>         at java.lang.Class.forName(Class.java:171)
>>         at
>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>>         at
>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>>     Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
>>     WARNING: No
>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found on
>> classpath, will use command-line arguments only
>>     Unknown program
>>  'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.
>>
>>
>>     ./classify-20newsgroups.sh ->1 [works]
>>     ./classify-20newsgroups.sh ->2 [works]
>>
>>
>>     cluster-reuters.sh ->1 [works]
>>     cluster-reuters.sh ->2 [works]
>>     cluster-reuters.sh ->3 [works]
>>
>>     Same error as noted previosly in the thread:
>>
>>     cluster-reuters.sh ->4 [0 clusters]
>>
>>     [...]
>>
>>     WARNING: No qualcluster.props found on classpath, will use
>> command-line arguments only
>>     Num clusters: 0; maxDistance: 0.000000
>>     [Dunn Index]
>>  First: Infinity
>>     [Davies-Bouldin Index] First: NaN
>>     Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
>>     INFO: Program took 669 ms (Minutes: 0.01115)
>>     cluster,distance.mean,distance.sd
>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>>
>>
>>
>>
>>
>>
>> > Date: Thu, 16 Jan 2014 06:41:09 -0800
>> > From: suneel_marthi@yahoo.com
>> > Subject: MAHOUT 0.9 Release - New URL
>> > To: user@mahout.apache.org; dev@mahout.apache.org
>> >
>> > Third time's a Charm!!!
>> >
>> >
>> > Here's the new URL for Mahout 0.9 Release:
>> >
>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>> >
>> > For those volunteering to test this, some of the things to be verified:
>> >
>> > a) Verify that u can unpack the release (tar or zip)
>> > b) Verify u r able to compile the distro
>> > c)  Run through the unit tests: mvn clean test
>> > d) Run the example scripts
>>  under $MAHOUT_HOME/examples/bin. Please run through all the different
>> options in each script.
>> >
>> >
>> > Committers
>> >  and PMC members:
>> > ---------------------------------------
>> >
>> > Need 'at least 3 +1 votes' for the Release to pass.
>> >
>> >
>> > Thanks and Regards.
>>
>
>

Re: MAHOUT 0.9 Release - New URL

Posted by Andrew Musselman <an...@gmail.com>.
Trying out the build today


On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <su...@yahoo.com>wrote:

> This is an issue (trivial one though) that needs to be fixed for 0.9
> Release, will be rerolling the release today (in the next few hrs) and
> putting out a new release candidate in staging.
>
> Thanks for reporting this Andrew P.
>
>
>
>
>
> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <ap...@outlook.com>
> wrote:
>
> I ran through the tests with on a CentOS VM AMD64 2 cores 4 GB RAM.  Had a
> bit of trouble getting the Hadoop natives to compile and therefore may have
> run into some problems because of the hadoop setup.  Ran into some problems
> in the example scripts.  Particularly with ./cluster-syntheticcontrol.sh
> ->4,5.  I will run through the rest of the examples when im sure I've got
> hadoop setup right.
>
>
> Apache Maven 3.1.2-SNAPSHOT
> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> Java home: /usr/java/jdk1.6.0_45/jre
> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64",
> family: "unix"
> $MAHOUT_LOCAL=true
> Hadoop 2.2.0
>
>
> a) Verify that u can unpack the release (tar or zip) ...passed (tar)
> [passed ]
>
> b) Verify u r able to compile the distro
>
>     mvn compile- [passed with warnings]
>
>     [WARNING]  Expected all dependencies to require Scala version: 2.9.3
>     [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires scala
> version: 2.9.3
>     [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala version:
> 2.9.2
>     [WARNING] Multiple versions of scala libraries detected!
>
> c)  Run through the unit tests: mvn clean test
>     mvn clean test [passed]
>
> d) Run the
>  example scripts under $MAHOUT_HOME/examples/bin.
> Please run through all the different options in each script
>
>     Running example scripts with $MAHOUT_LOCAL=true
>
>     ./cluster-syntheticcontrol.sh ->1 [works]
>     ./cluster-syntheticcontrol.sh ->2 [works]
>     ./cluster-syntheticcontrol.sh ->3 [works]
>
>
>     ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
>     [...]
>     WARNING: Unable to add class:
> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>     java.lang.ClassNotFoundException:
> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>         at
>  java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:171)
>         at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>         at
>  org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>     Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
>
>
>     ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
>
>     WARNING: Unable to add class:
> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>     java.lang.ClassNotFoundException:
> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:171)
>         at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>         at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>     Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
>     WARNING: No
> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found on
> classpath, will use command-line arguments only
>     Unknown program
>  'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.
>
>
>     ./classify-20newsgroups.sh ->1 [works]
>     ./classify-20newsgroups.sh ->2 [works]
>
>
>     cluster-reuters.sh ->1 [works]
>     cluster-reuters.sh ->2 [works]
>     cluster-reuters.sh ->3 [works]
>
>     Same error as noted previosly in the thread:
>
>     cluster-reuters.sh ->4 [0 clusters]
>
>     [...]
>
>     WARNING: No qualcluster.props found on classpath, will use
> command-line arguments only
>     Num clusters: 0; maxDistance: 0.000000
>     [Dunn Index]
>  First: Infinity
>     [Davies-Bouldin Index] First: NaN
>     Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
>     INFO: Program took 669 ms (Minutes: 0.01115)
>     cluster,distance.mean,distance.sd
> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>
>
>
>
>
>
> > Date: Thu, 16 Jan 2014 06:41:09 -0800
> > From: suneel_marthi@yahoo.com
> > Subject: MAHOUT 0.9 Release - New URL
> > To: user@mahout.apache.org; dev@mahout.apache.org
> >
> > Third time's a Charm!!!
> >
> >
> > Here's the new URL for Mahout 0.9 Release:
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> >
> > For those volunteering to test this, some of the things to be verified:
> >
> > a) Verify that u can unpack the release (tar or zip)
> > b) Verify u r able to compile the distro
> > c)  Run through the unit tests: mvn clean test
> > d) Run the example scripts
>  under $MAHOUT_HOME/examples/bin. Please run through all the different
> options in each script.
> >
> >
> > Committers
> >  and PMC members:
> > ---------------------------------------
> >
> > Need 'at least 3 +1 votes' for the Release to pass.
> >
> >
> > Thanks and Regards.
>

Re: MAHOUT 0.9 Release - New URL

Posted by Suneel Marthi <su...@yahoo.com>.
This is an issue (trivial one though) that needs to be fixed for 0.9 Release, will be rerolling the release today (in the next few hrs) and putting out a new release candidate in staging.

Thanks for reporting this Andrew P. 





On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <ap...@outlook.com> wrote:
 
I ran through the tests with on a CentOS VM AMD64 2 cores 4 GB RAM.  Had a bit of trouble getting the Hadoop natives to compile and therefore may have run into some problems because of the hadoop setup.  Ran into some problems in the example scripts.  Particularly with ./cluster-syntheticcontrol.sh ->4,5.  I will run through the rest of the examples when im sure I've got hadoop setup right.


Apache Maven 3.1.2-SNAPSHOT 
Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
Java home: /usr/java/jdk1.6.0_45/jre
OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64", family: "unix"
$MAHOUT_LOCAL=true
Hadoop 2.2.0


a) Verify that u can unpack the release (tar or zip) ...passed (tar) [passed ]

b) Verify u r able to compile the distro

    mvn compile- [passed with warnings]

    [WARNING]  Expected all dependencies to require Scala version: 2.9.3
    [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires scala version: 2.9.3
    [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala version: 2.9.2
    [WARNING] Multiple versions of scala libraries detected!

c)  Run through the unit tests: mvn clean test
    mvn clean test [passed]

d) Run the
 example scripts under $MAHOUT_HOME/examples/bin. 
Please run through all the different options in each script

    Running example scripts with $MAHOUT_LOCAL=true

    ./cluster-syntheticcontrol.sh ->1 [works]
    ./cluster-syntheticcontrol.sh ->2 [works]
    ./cluster-syntheticcontrol.sh ->3 [works]


    ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
    [...]
    WARNING: Unable to add class: org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
    java.lang.ClassNotFoundException: org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
        at
 java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:171)
        at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
        at
 org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn


    ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]

    WARNING: Unable to add class: org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
    java.lang.ClassNotFoundException: org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:171)
        at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
    WARNING: No org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found on classpath, will use command-line arguments only
    Unknown program
 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.


    ./classify-20newsgroups.sh ->1 [works]
    ./classify-20newsgroups.sh ->2 [works]


    cluster-reuters.sh ->1 [works]
    cluster-reuters.sh ->2 [works]
    cluster-reuters.sh ->3 [works]
    
    Same error as noted previosly in the thread:

    cluster-reuters.sh ->4 [0 clusters]

    [...]

    WARNING: No qualcluster.props found on classpath, will use command-line arguments only
    Num clusters: 0; maxDistance: 0.000000
    [Dunn Index]
 First: Infinity
    [Davies-Bouldin Index] First: NaN
    Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
    INFO: Program took 669 ms (Minutes: 0.01115)
    cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train






> Date: Thu, 16 Jan 2014 06:41:09 -0800
> From: suneel_marthi@yahoo.com
> Subject: MAHOUT 0.9 Release - New URL 
> To: user@mahout.apache.org; dev@mahout.apache.org
> 
> Third time's a Charm!!!
> 
> 
> Here's the new URL for Mahout 0.9 Release:
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> 
> For those volunteering to test this, some of the things to be verified:
> 
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c)  Run through the unit tests: mvn clean test
> d) Run the example scripts
 under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
>      
> 
> Committers
>  and PMC members:
> ---------------------------------------
> 
> Need 'at least 3 +1 votes' for the Release to pass. 
> 
> 
> Thanks and Regards.

Re: MAHOUT 0.9 Release - New URL

Posted by Suneel Marthi <su...@yahoo.com>.
This is an issue (trivial one though) that needs to be fixed for 0.9 Release, will be rerolling the release today (in the next few hrs) and putting out a new release candidate in staging.

Thanks for reporting this Andrew P. 





On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <ap...@outlook.com> wrote:
 
I ran through the tests with on a CentOS VM AMD64 2 cores 4 GB RAM.  Had a bit of trouble getting the Hadoop natives to compile and therefore may have run into some problems because of the hadoop setup.  Ran into some problems in the example scripts.  Particularly with ./cluster-syntheticcontrol.sh ->4,5.  I will run through the rest of the examples when im sure I've got hadoop setup right.


Apache Maven 3.1.2-SNAPSHOT 
Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
Java home: /usr/java/jdk1.6.0_45/jre
OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64", family: "unix"
$MAHOUT_LOCAL=true
Hadoop 2.2.0


a) Verify that u can unpack the release (tar or zip) ...passed (tar) [passed ]

b) Verify u r able to compile the distro

    mvn compile- [passed with warnings]

    [WARNING]  Expected all dependencies to require Scala version: 2.9.3
    [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires scala version: 2.9.3
    [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala version: 2.9.2
    [WARNING] Multiple versions of scala libraries detected!

c)  Run through the unit tests: mvn clean test
    mvn clean test [passed]

d) Run the
 example scripts under $MAHOUT_HOME/examples/bin. 
Please run through all the different options in each script

    Running example scripts with $MAHOUT_LOCAL=true

    ./cluster-syntheticcontrol.sh ->1 [works]
    ./cluster-syntheticcontrol.sh ->2 [works]
    ./cluster-syntheticcontrol.sh ->3 [works]


    ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
    [...]
    WARNING: Unable to add class: org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
    java.lang.ClassNotFoundException: org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
        at
 java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:171)
        at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
        at
 org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn


    ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]

    WARNING: Unable to add class: org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
    java.lang.ClassNotFoundException: org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:171)
        at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
    WARNING: No org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found on classpath, will use command-line arguments only
    Unknown program
 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.


    ./classify-20newsgroups.sh ->1 [works]
    ./classify-20newsgroups.sh ->2 [works]


    cluster-reuters.sh ->1 [works]
    cluster-reuters.sh ->2 [works]
    cluster-reuters.sh ->3 [works]
    
    Same error as noted previosly in the thread:

    cluster-reuters.sh ->4 [0 clusters]

    [...]

    WARNING: No qualcluster.props found on classpath, will use command-line arguments only
    Num clusters: 0; maxDistance: 0.000000
    [Dunn Index]
 First: Infinity
    [Davies-Bouldin Index] First: NaN
    Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
    INFO: Program took 669 ms (Minutes: 0.01115)
    cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train






> Date: Thu, 16 Jan 2014 06:41:09 -0800
> From: suneel_marthi@yahoo.com
> Subject: MAHOUT 0.9 Release - New URL 
> To: user@mahout.apache.org; dev@mahout.apache.org
> 
> Third time's a Charm!!!
> 
> 
> Here's the new URL for Mahout 0.9 Release:
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> 
> For those volunteering to test this, some of the things to be verified:
> 
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c)  Run through the unit tests: mvn clean test
> d) Run the example scripts
 under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
>      
> 
> Committers
>  and PMC members:
> ---------------------------------------
> 
> Need 'at least 3 +1 votes' for the Release to pass. 
> 
> 
> Thanks and Regards.

Re: MAHOUT 0.9 Release - New URL

Posted by Suneel Marthi <su...@yahoo.com>.
Hmmm... that's an issue. Since both Dirichlet and Meanshift clustering have been removed from 0.9, cluster-syntheticcontrol.sh options 4,5 are not gonna work and should have been removed for 0.9.

To PMC,

 -> rollback the release, fix this issue (and other patches that were submitted in the last few days) and put out another release ?







On Monday, January 20, 2014 12:33 AM, Andrew Palumbo <ap...@outlook.com> wrote:
 
I ran through the tests with on a CentOS VM AMD64 2 cores 4 GB RAM.  Had a bit of trouble getting the Hadoop natives to compile and therefore may have run into some problems because of the hadoop setup.  Ran into some problems in the example scripts.  Particularly with ./cluster-syntheticcontrol.sh ->4,5.  I will run through the rest of the examples when im sure I've got hadoop setup right.


Apache Maven 3.1.2-SNAPSHOT 
Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
Java home: /usr/java/jdk1.6.0_45/jre
OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64", family: "unix"
$MAHOUT_LOCAL=true
Hadoop 2.2.0


a) Verify that u can unpack the release (tar or zip) ...passed (tar) [passed ]

b) Verify u r able to compile the distro

    mvn compile- [passed with warnings]

    [WARNING]  Expected all dependencies to require Scala version: 2.9.3
    [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires scala version: 2.9.3
    [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala version: 2.9.2
    [WARNING] Multiple versions of scala libraries detected!

c)  Run through the unit tests: mvn clean test
    mvn clean test [passed]

d) Run the example scripts under $MAHOUT_HOME/examples/bin. 
Please run through all the different options in each script

    Running example scripts with $MAHOUT_LOCAL=true

    ./cluster-syntheticcontrol.sh ->1 [works]
    ./cluster-syntheticcontrol.sh ->2 [works]
    ./cluster-syntheticcontrol.sh ->3 [works]


    ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
    [...]
    WARNING: Unable to add class: org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
    java.lang.ClassNotFoundException: org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:171)
        at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn


    ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]

    WARNING: Unable to add class: org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
    java.lang.ClassNotFoundException: org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:171)
        at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
    WARNING: No org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found on classpath, will use command-line arguments only
    Unknown program 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.


    ./classify-20newsgroups.sh ->1 [works]
    ./classify-20newsgroups.sh ->2 [works]


    cluster-reuters.sh ->1 [works]
    cluster-reuters.sh ->2 [works]
    cluster-reuters.sh ->3 [works]
    
    Same error as noted previosly in the thread:

    cluster-reuters.sh ->4 [0 clusters]

    [...]

    WARNING: No qualcluster.props found on classpath, will use command-line arguments only
    Num clusters: 0; maxDistance: 0.000000
    [Dunn Index] First: Infinity
    [Davies-Bouldin Index] First: NaN
    Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
    INFO: Program took 669 ms (Minutes: 0.01115)
    cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train






> Date: Thu, 16 Jan 2014 06:41:09 -0800
> From: suneel_marthi@yahoo.com
> Subject: MAHOUT 0.9 Release - New URL 
> To: user@mahout.apache.org; dev@mahout.apache.org
> 
> Third time's a Charm!!!
> 
> 
> Here's the new URL for Mahout 0.9 Release:
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> 
> For those volunteering to test this, some of the things to be verified:
> 
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c)  Run through the unit tests: mvn clean test
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
>      
> 
> Committers
>  and PMC members:
> ---------------------------------------
> 
> Need 'at least 3 +1 votes' for the Release to pass. 
> 
> 
> Thanks and Regards.

RE: MAHOUT 0.9 Release - New URL

Posted by Andrew Palumbo <ap...@outlook.com>.
I ran through the tests with on a CentOS VM AMD64 2 cores 4 GB RAM.  Had a bit of trouble getting the Hadoop natives to compile and therefore may have run into some problems because of the hadoop setup.  Ran into some problems in the example scripts.  Particularly with ./cluster-syntheticcontrol.sh ->4,5.  I will run through the rest of the examples when im sure I've got hadoop setup right.


Apache Maven 3.1.2-SNAPSHOT 
Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
Java home: /usr/java/jdk1.6.0_45/jre
OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64", family: "unix"
$MAHOUT_LOCAL=true
Hadoop 2.2.0


a) Verify that u can unpack the release (tar or zip) ...passed (tar) [passed ]

b) Verify u r able to compile the distro

    mvn compile- [passed with warnings]

    [WARNING]  Expected all dependencies to require Scala version: 2.9.3
    [WARNING]  org.apache.mahout:mahout-math-scala:0.9 requires scala version: 2.9.3
    [WARNING]  org.scalatest:scalatest_2.9.2:1.9.1 requires scala version: 2.9.2
    [WARNING] Multiple versions of scala libraries detected!

c)  Run through the unit tests: mvn clean test
    mvn clean test [passed]

d) Run the example scripts under $MAHOUT_HOME/examples/bin. 
Please run through all the different options in each script

    Running example scripts with $MAHOUT_LOCAL=true

    ./cluster-syntheticcontrol.sh ->1 [works]
    ./cluster-syntheticcontrol.sh ->2 [works]
    ./cluster-syntheticcontrol.sh ->3 [works]


    ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
    [...]
    WARNING: Unable to add class: org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
    java.lang.ClassNotFoundException: org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:171)
        at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn


    ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]

    WARNING: Unable to add class: org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
    java.lang.ClassNotFoundException: org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:171)
        at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
    WARNING: No org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found on classpath, will use command-line arguments only
    Unknown program 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.


    ./classify-20newsgroups.sh ->1 [works]
    ./classify-20newsgroups.sh ->2 [works]


    cluster-reuters.sh ->1 [works]
    cluster-reuters.sh ->2 [works]
    cluster-reuters.sh ->3 [works]
    
    Same error as noted previosly in the thread:

    cluster-reuters.sh ->4 [0 clusters]

    [...]

    WARNING: No qualcluster.props found on classpath, will use command-line arguments only
    Num clusters: 0; maxDistance: 0.000000
    [Dunn Index] First: Infinity
    [Davies-Bouldin Index] First: NaN
    Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
    INFO: Program took 669 ms (Minutes: 0.01115)
    cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train





> Date: Thu, 16 Jan 2014 06:41:09 -0800
> From: suneel_marthi@yahoo.com
> Subject: MAHOUT 0.9 Release - New URL 
> To: user@mahout.apache.org; dev@mahout.apache.org
> 
> Third time's a Charm!!!
> 
> 
> Here's the new URL for Mahout 0.9 Release:
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> 
> For those volunteering to test this, some of the things to be verified:
> 
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c)  Run through the unit tests: mvn clean test
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
>      
> 
> Committers
>  and PMC members:
> ---------------------------------------
> 
> Need 'at least 3 +1 votes' for the Release to pass. 
> 
> 
> Thanks and Regards.