You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Suneel Marthi <su...@yahoo.com> on 2014/01/16 15:41:09 UTC
MAHOUT 0.9 Release - New URL
Third time's a Charm!!!
Here's the new URL for Mahout 0.9 Release:
https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
For those volunteering to test this, some of the things to be verified:
a) Verify that u can unpack the release (tar or zip)
b) Verify u r able to compile the distro
c) Run through the unit tests: mvn clean test
d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
Committers
and PMC members:
---------------------------------------
Need 'at least 3 +1 votes' for the Release to pass.
Thanks and Regards.
Re: MAHOUT 0.9 Release - New URL
Posted by Shannon Quinn <sq...@gatech.edu>.
a), b), and c) all pass for me. Don't have the setup yet at work to go
through d), will wait for others to verify.
On 1/16/14, 9:41 AM, Suneel Marthi wrote:
> Third time's a Charm!!!
>
>
> Here's the new URL for Mahout 0.9 Release:
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>
> For those volunteering to test this, some of the things to be verified:
>
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c) Run through the unit tests: mvn clean test
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
>
>
> Committers
> and PMC members:
> ---------------------------------------
>
> Need 'at least 3 +1 votes' for the Release to pass.
>
>
> Thanks and Regards.
>
Re: MAHOUT 0.9 Release - New URL
Posted by Suneel Marthi <su...@yahoo.com>.
Stevo, could u test streaming kmeans?
Sent from my iPhone
> On Jan 19, 2014, at 8:10 PM, Stevo Slavić <ss...@gmail.com> wrote:
>
> +1 (binding)
>
>
>> On Sun, Jan 19, 2014 at 7:49 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>>
>> I'll try to test out soon
>>
Re: MAHOUT 0.9 Release - New URL
Posted by Stevo Slavić <ss...@gmail.com>.
+1 (binding)
On Sun, Jan 19, 2014 at 7:49 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> I'll try to test out soon
>
Re: MAHOUT 0.9 Release - New URL
Posted by Dmitriy Lyubimov <dl...@gmail.com>.
I'll try to test out soon
Re: MAHOUT 0.9 Release - New URL
Posted by Suneel Marthi <su...@yahoo.com>.
The reason u r seeing the error is because there are were no sequence files in HDFS in MR mode to begin with => hence no term vectors generated => and hence no vectors to cluster.
MR mode:
------------
1. Set HADOOP_HOME
2. unset MAHOUT_LOCAL
3. clean up ur local /tmp/mahout-work-xxxxx directory
4. run ./examples/bin/cluster-reuters.sh => option 4
Sequential Mode:
---------------------
1. set MAHOUT_LOCAL=true
2. Add "-xm sequential" flag to cluster-reuters.sh script
3. run ./examples/bin/cluster-reuters.sh => option 4
On Sunday, January 19, 2014 12:22 PM, Frank Scholten <fr...@frankscholten.nl> wrote:
When I run in MR mode I get the same problem.
See http://pastebin.com/TXJ5mQmt
On Sun, Jan 19, 2014 at 5:31 PM, Frank Scholten <fr...@frankscholten.nl> wrote:
OK, running in MR mode now.
>
>
>
>
>On Sun, Jan 19, 2014 at 5:30 PM, Suneel Marthi <su...@yahoo.com> wrote:
>
>Its presently setup to run in MR mode (the way its been coded in cluster-reuters.sh). So setting MAHOUT_LOCAL=true is gonna fail for this.
>>I am able to see this fail locally when MAHOUT_LOCAL=true.
>>
>>
>>
>>
>>
>>
>>On Sunday, January 19, 2014 11:17 AM, Frank Scholten <fr...@frankscholten.nl> wrote:
>>
>>Exported MAHOUT_LOCAL=true and still get the same results.
>>
>>
>>
>>On Sun, Jan 19, 2014 at 5:00 PM, Suneel Marthi <su...@yahoo.com>wrote:
>>
>>> Frank,
>>>
>>> Were u running this with MAHOUT_LOCAL=true?
>>>
>>>
>>>
>>>
>>>
>>> On Sunday, January 19, 2014 10:29 AM, Frank Scholten <
>>> frank@frankscholten.nl> wrote:
>>>
>>> -1
>>>
>>> The cluster reuters example results in zero clusters when choosing
>>> streaming k-means. The other steps, unpacking and building do work.
>>>
>>> I see this stacktrace:
>>>
>>> INFO: Number of Centroids: 0
>>> Jan 19, 2014 3:51:08 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
>>> WARNING: job_local797072544_0001
>>> java.lang.IllegalArgumentException: Must have nonzero number of training
>>> and test vectors. Asked for %.1f %% of %d vectors for test
>>> [10.000000149011612, 0]
>>> at
>>> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
>>> at
>>> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
>>> at
>>> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
>>> at
>>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
>>> at
>>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
>>> at
>>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
>>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
>>> at
>>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>>> at
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
>>>
>>> Num clusters: 0; maxDistance: 0.000000
>>> [Dunn Index] First: Infinity
>>> [Davies-Bouldin Index] First: NaN
>>> Jan 19, 2014 3:51:09 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 278 ms (Minutes: 0.004633333333333333)
>>> cluster,distance.mean,distance.sd
>>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>>>
>>>
>>> Here is the full log: http://pastebin.com/TxLV0rDr
>>>
>>> As of yet I am unfamiliar with the streaming k-means code and the
>>> algorithms behind it. If anyone has suggestion on what goes wrong in the
>>> code I am I happy to help where I can.
>>>
>>>
>>> Frank
>>>
>>>
>>>
>>> On Sun, Jan 19, 2014 at 10:55 AM, Suneel Marthi <su...@yahoo.com>
>>> wrote:
>>>
>>> Thanks Grant.
>>> >
>>> >Not sure if I can vote given my role as the BuildMeister/ReleaseMeister
>>> for 0.9.
>>> >Here's my +1 FWIW.
>>> >
>>> >a) Attached is the draft of the Release notes for 0.9, would definitely
>>> appreciate feedback on that.
>>> >
>>> >b) The vote is open until Monday, Jan 20, 2014 11:59PM EST and passes if
>>> a majority of atleast 3 +1 PMC votes are cast.
>>> >
>>> >The release files, including signatures, digests, etc can be found at:
>>> >
>>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>>> >
>>> >The staging repository for this release can be found at:
>>> >https://repository.apache.org/content/repositories/orgapachemahout-1002
>>> >
>>> >Release artifacts have been signed with the following key:
>>> >https://people.apache.org/keys/committer/smarthi.asc
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >On Saturday, January 18, 2014 12:27 PM, Grant Ingersoll <
>>> gsingers@apache.org> wrote:
>>> >
>>> >Ran the tests, verified sigs, tried out a few of the examples.
>>> >
>>> >+1 (binding)
>>> >
>>> >
>>> >On Jan 16, 2014, at 9:41 AM, Suneel Marthi <su...@yahoo.com>
>>> wrote:
>>> >
>>> >> Third time's a Charm!!!
>>> >>
>>> >>
>>> >> Here's the new URL for Mahout 0.9 Release:
>>> >>
>>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>>> >>
>>> >> For those volunteering to test this, some of the things to be verified:
>>> >>
>>> >> a) Verify that u can
>>> unpack the release (tar or zip)
>>> >> b) Verify u r able to compile the distro
>>> >> c) Run through the unit tests: mvn clean test
>>> >> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
>>> through all the different options in each script.
>>> >>
>>> >>
>>> >> Committers
>>> >> and PMC members:
>>> >> ---------------------------------------
>>> >>
>>> >> Need 'at least 3 +1 votes' for the Release to pass.
>>> >>
>>> >>
>>> >> Thanks and Regards.
>>> >
>>> >
>>> >
>>> >
>>>
>
Re: MAHOUT 0.9 Release - New URL
Posted by Frank Scholten <fr...@frankscholten.nl>.
When I run in MR mode I get the same problem.
See http://pastebin.com/TXJ5mQmt
On Sun, Jan 19, 2014 at 5:31 PM, Frank Scholten <fr...@frankscholten.nl>wrote:
> OK, running in MR mode now.
>
>
> On Sun, Jan 19, 2014 at 5:30 PM, Suneel Marthi <su...@yahoo.com>wrote:
>
>> Its presently setup to run in MR mode (the way its been coded in
>> cluster-reuters.sh). So setting MAHOUT_LOCAL=true is gonna fail for this.
>> I am able to see this fail locally when MAHOUT_LOCAL=true.
>>
>>
>>
>>
>>
>> On Sunday, January 19, 2014 11:17 AM, Frank Scholten <
>> frank@frankscholten.nl> wrote:
>>
>> Exported MAHOUT_LOCAL=true and still get the same results.
>>
>>
>>
>> On Sun, Jan 19, 2014 at 5:00 PM, Suneel Marthi <suneel_marthi@yahoo.com
>> >wrote:
>>
>> > Frank,
>> >
>> > Were u running this with MAHOUT_LOCAL=true?
>> >
>> >
>> >
>> >
>> >
>> > On Sunday, January 19, 2014 10:29 AM, Frank Scholten <
>> > frank@frankscholten.nl> wrote:
>> >
>> > -1
>> >
>> > The cluster reuters example results in zero clusters when choosing
>> > streaming k-means. The other steps, unpacking and building do work.
>> >
>> > I see this stacktrace:
>> >
>> > INFO: Number of Centroids: 0
>> > Jan 19, 2014 3:51:08 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
>> > WARNING: job_local797072544_0001
>> > java.lang.IllegalArgumentException: Must have nonzero number of training
>> > and test vectors. Asked for %.1f %% of %d vectors for test
>> > [10.000000149011612, 0]
>> > at
>> >
>> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
>> > at
>> >
>> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
>> > at
>> >
>> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
>> > at
>> >
>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
>> > at
>> >
>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
>> > at
>> >
>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
>> > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
>> > at
>> > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>> > at
>> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
>> >
>> > Num clusters: 0; maxDistance: 0.000000
>> > [Dunn Index] First: Infinity
>> > [Davies-Bouldin Index] First: NaN
>> > Jan 19, 2014 3:51:09 PM org.slf4j.impl.JCLLoggerAdapter info
>> > INFO: Program took 278 ms (Minutes: 0.004633333333333333)
>> > cluster,distance.mean,distance.sd
>> >
>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>> >
>> >
>> > Here is the full log: http://pastebin.com/TxLV0rDr
>> >
>> > As of yet I am unfamiliar with the streaming k-means code and the
>> > algorithms behind it. If anyone has suggestion on what goes wrong in the
>> > code I am I happy to help where I can.
>> >
>> >
>> > Frank
>> >
>> >
>> >
>> > On Sun, Jan 19, 2014 at 10:55 AM, Suneel Marthi <
>> suneel_marthi@yahoo.com>
>> > wrote:
>> >
>> > Thanks Grant.
>> > >
>> > >Not sure if I can vote given my role as the BuildMeister/ReleaseMeister
>> > for 0.9.
>> > >Here's my +1 FWIW.
>> > >
>> > >a) Attached is the draft of the Release notes for 0.9, would definitely
>> > appreciate feedback on that.
>> > >
>> > >b) The vote is open until Monday, Jan 20, 2014 11:59PM EST and passes
>> if
>> > a majority of atleast 3 +1 PMC votes are cast.
>> > >
>> > >The release files, including signatures, digests, etc can be found at:
>> > >
>> >
>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>> > >
>> > >The staging repository for this release can be found at:
>> > >
>> https://repository.apache.org/content/repositories/orgapachemahout-1002
>> > >
>> > >Release artifacts have been signed with the following key:
>> > >https://people.apache.org/keys/committer/smarthi.asc
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >On Saturday, January 18, 2014 12:27 PM, Grant Ingersoll <
>> > gsingers@apache.org> wrote:
>> > >
>> > >Ran the tests, verified sigs, tried out a few of the examples.
>> > >
>> > >+1 (binding)
>> > >
>> > >
>> > >On Jan 16, 2014, at 9:41 AM, Suneel Marthi <su...@yahoo.com>
>> > wrote:
>> > >
>> > >> Third time's a Charm!!!
>> > >>
>> > >>
>> > >> Here's the new URL for Mahout 0.9 Release:
>> > >>
>> >
>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>> > >>
>> > >> For those volunteering to test this, some of the things to be
>> verified:
>> > >>
>> > >> a) Verify that u can
>> > unpack the release (tar or zip)
>> > >> b) Verify u r able to compile the distro
>> > >> c) Run through the unit tests: mvn clean test
>> > >> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please
>> run
>> > through all the different options in each script.
>> > >>
>> > >>
>> > >> Committers
>> > >> and PMC members:
>> > >> ---------------------------------------
>> > >>
>> > >> Need 'at least 3 +1 votes' for the Release to pass.
>> > >>
>> > >>
>> > >> Thanks and Regards.
>> > >
>> > >
>> > >
>> > >
>> >
>>
>
>
Re: MAHOUT 0.9 Release - New URL
Posted by Frank Scholten <fr...@frankscholten.nl>.
OK, running in MR mode now.
On Sun, Jan 19, 2014 at 5:30 PM, Suneel Marthi <su...@yahoo.com>wrote:
> Its presently setup to run in MR mode (the way its been coded in
> cluster-reuters.sh). So setting MAHOUT_LOCAL=true is gonna fail for this.
> I am able to see this fail locally when MAHOUT_LOCAL=true.
>
>
>
>
>
> On Sunday, January 19, 2014 11:17 AM, Frank Scholten <
> frank@frankscholten.nl> wrote:
>
> Exported MAHOUT_LOCAL=true and still get the same results.
>
>
>
> On Sun, Jan 19, 2014 at 5:00 PM, Suneel Marthi <suneel_marthi@yahoo.com
> >wrote:
>
> > Frank,
> >
> > Were u running this with MAHOUT_LOCAL=true?
> >
> >
> >
> >
> >
> > On Sunday, January 19, 2014 10:29 AM, Frank Scholten <
> > frank@frankscholten.nl> wrote:
> >
> > -1
> >
> > The cluster reuters example results in zero clusters when choosing
> > streaming k-means. The other steps, unpacking and building do work.
> >
> > I see this stacktrace:
> >
> > INFO: Number of Centroids: 0
> > Jan 19, 2014 3:51:08 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
> > WARNING: job_local797072544_0001
> > java.lang.IllegalArgumentException: Must have nonzero number of training
> > and test vectors. Asked for %.1f %% of %d vectors for test
> > [10.000000149011612, 0]
> > at
> >
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
> > at
> >
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
> > at
> >
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
> > at
> >
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
> > at
> >
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
> > at
> >
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
> > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
> > at
> > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
> > at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
> >
> > Num clusters: 0; maxDistance: 0.000000
> > [Dunn Index] First: Infinity
> > [Davies-Bouldin Index] First: NaN
> > Jan 19, 2014 3:51:09 PM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: Program took 278 ms (Minutes: 0.004633333333333333)
> > cluster,distance.mean,distance.sd
> >
> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> >
> >
> > Here is the full log: http://pastebin.com/TxLV0rDr
> >
> > As of yet I am unfamiliar with the streaming k-means code and the
> > algorithms behind it. If anyone has suggestion on what goes wrong in the
> > code I am I happy to help where I can.
> >
> >
> > Frank
> >
> >
> >
> > On Sun, Jan 19, 2014 at 10:55 AM, Suneel Marthi <suneel_marthi@yahoo.com
> >
> > wrote:
> >
> > Thanks Grant.
> > >
> > >Not sure if I can vote given my role as the BuildMeister/ReleaseMeister
> > for 0.9.
> > >Here's my +1 FWIW.
> > >
> > >a) Attached is the draft of the Release notes for 0.9, would definitely
> > appreciate feedback on that.
> > >
> > >b) The vote is open until Monday, Jan 20, 2014 11:59PM EST and passes if
> > a majority of atleast 3 +1 PMC votes are cast.
> > >
> > >The release files, including signatures, digests, etc can be found at:
> > >
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> > >
> > >The staging repository for this release can be found at:
> > >https://repository.apache.org/content/repositories/orgapachemahout-1002
> > >
> > >Release artifacts have been signed with the following key:
> > >https://people.apache.org/keys/committer/smarthi.asc
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >On Saturday, January 18, 2014 12:27 PM, Grant Ingersoll <
> > gsingers@apache.org> wrote:
> > >
> > >Ran the tests, verified sigs, tried out a few of the examples.
> > >
> > >+1 (binding)
> > >
> > >
> > >On Jan 16, 2014, at 9:41 AM, Suneel Marthi <su...@yahoo.com>
> > wrote:
> > >
> > >> Third time's a Charm!!!
> > >>
> > >>
> > >> Here's the new URL for Mahout 0.9 Release:
> > >>
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> > >>
> > >> For those volunteering to test this, some of the things to be
> verified:
> > >>
> > >> a) Verify that u can
> > unpack the release (tar or zip)
> > >> b) Verify u r able to compile the distro
> > >> c) Run through the unit tests: mvn clean test
> > >> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> > through all the different options in each script.
> > >>
> > >>
> > >> Committers
> > >> and PMC members:
> > >> ---------------------------------------
> > >>
> > >> Need 'at least 3 +1 votes' for the Release to pass.
> > >>
> > >>
> > >> Thanks and Regards.
> > >
> > >
> > >
> > >
> >
>
Re: MAHOUT 0.9 Release - New URL
Posted by Suneel Marthi <su...@yahoo.com>.
It works when both MAHOUT_LOCAL=true and '-xm sequential' option are set.
Guess will have to cut a release again with '-xm sequential' option set.
On Sunday, January 19, 2014 11:31 AM, Suneel Marthi <su...@yahoo.com> wrote:
Its presently setup to run in MR mode (the way its been coded in cluster-reuters.sh). So setting MAHOUT_LOCAL=true is gonna fail for this.
I am able to see this fail locally when MAHOUT_LOCAL=true.
On Sunday, January 19, 2014 11:17 AM, Frank Scholten <fr...@frankscholten.nl> wrote:
Exported MAHOUT_LOCAL=true and still get the same results.
On Sun, Jan 19, 2014 at 5:00 PM, Suneel Marthi <su...@yahoo.com>wrote:
> Frank,
>
> Were u running this with MAHOUT_LOCAL=true?
>
>
>
>
>
> On Sunday, January 19, 2014 10:29 AM, Frank Scholten <
> frank@frankscholten.nl> wrote:
>
> -1
>
> The cluster reuters example results in zero clusters when choosing
> streaming k-means. The other steps, unpacking and building do work.
>
> I see this stacktrace:
>
> INFO: Number of Centroids: 0
> Jan 19, 2014 3:51:08 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
> WARNING: job_local797072544_0001
> java.lang.IllegalArgumentException: Must have nonzero number of training
> and test
vectors. Asked for %.1f %% of %d vectors for test
> [10.000000149011612, 0]
> at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
> at
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
> at
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
> at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
> at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
> at
>
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
> at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
>
> Num clusters: 0; maxDistance: 0.000000
> [Dunn Index] First: Infinity
> [Davies-Bouldin Index] First: NaN
> Jan 19, 2014 3:51:09 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 278 ms (Minutes: 0.004633333333333333)
>
cluster,distance.mean,distance.sd
> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>
>
> Here is the full log: http://pastebin.com/TxLV0rDr
>
> As of yet I am unfamiliar with the streaming k-means code and the
> algorithms behind it. If anyone has suggestion on what goes wrong in the
> code I am I happy to help where I can.
>
>
> Frank
>
>
>
> On Sun, Jan 19, 2014 at 10:55 AM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> Thanks Grant.
> >
> >Not sure if I can vote given my role as the BuildMeister/ReleaseMeister
> for 0.9.
> >Here's my +1 FWIW.
> >
> >a) Attached is the draft of the Release notes for 0.9, would definitely
> appreciate feedback on that.
> >
> >b) The vote is open until Monday, Jan 20, 2014 11:59PM EST and passes if
> a majority of atleast 3 +1 PMC votes are cast.
> >
> >The release files, including signatures, digests, etc can be found at:
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> >
> >The staging repository for this release can be found at:
> >https://repository.apache.org/content/repositories/orgapachemahout-1002
> >
> >Release artifacts have been signed with the following key:
> >https://people.apache.org/keys/committer/smarthi.asc
> >
> >
> >
> >
> >
> >
> >
> >
>
>On Saturday, January 18, 2014 12:27 PM, Grant Ingersoll <
> gsingers@apache.org> wrote:
> >
> >Ran the tests, verified sigs, tried out a few of the examples.
> >
> >+1 (binding)
> >
> >
> >On Jan 16, 2014, at 9:41 AM, Suneel Marthi <su...@yahoo.com>
> wrote:
> >
> >> Third time's a Charm!!!
> >>
> >>
> >> Here's the new URL for Mahout 0.9 Release:
>
>>
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> >>
> >> For those volunteering to test this, some of the things to be verified:
> >>
> >> a) Verify that u can
> unpack the release (tar or zip)
> >> b) Verify u r able to compile the distro
> >> c) Run through the unit tests: mvn clean test
> >> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> through all the different options in each script.
> >>
>
>>
> >> Committers
> >> and PMC members:
> >> ---------------------------------------
> >>
> >> Need 'at least 3 +1 votes' for the Release to pass.
> >>
> >>
> >> Thanks and Regards.
> >
> >
> >
> >
>
Re: MAHOUT 0.9 Release - New URL
Posted by Suneel Marthi <su...@yahoo.com>.
Its presently setup to run in MR mode (the way its been coded in cluster-reuters.sh). So setting MAHOUT_LOCAL=true is gonna fail for this.
I am able to see this fail locally when MAHOUT_LOCAL=true.
On Sunday, January 19, 2014 11:17 AM, Frank Scholten <fr...@frankscholten.nl> wrote:
Exported MAHOUT_LOCAL=true and still get the same results.
On Sun, Jan 19, 2014 at 5:00 PM, Suneel Marthi <su...@yahoo.com>wrote:
> Frank,
>
> Were u running this with MAHOUT_LOCAL=true?
>
>
>
>
>
> On Sunday, January 19, 2014 10:29 AM, Frank Scholten <
> frank@frankscholten.nl> wrote:
>
> -1
>
> The cluster reuters example results in zero clusters when choosing
> streaming k-means. The other steps, unpacking and building do work.
>
> I see this stacktrace:
>
> INFO: Number of Centroids: 0
> Jan 19, 2014 3:51:08 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
> WARNING: job_local797072544_0001
> java.lang.IllegalArgumentException: Must have nonzero number of training
> and test vectors. Asked for %.1f %% of %d vectors for test
> [10.000000149011612, 0]
> at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
> at
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
> at
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
> at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
> at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
> at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
> at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
>
> Num clusters: 0; maxDistance: 0.000000
> [Dunn Index] First: Infinity
> [Davies-Bouldin Index] First: NaN
> Jan 19, 2014 3:51:09 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 278 ms (Minutes: 0.004633333333333333)
> cluster,distance.mean,distance.sd
> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>
>
> Here is the full log: http://pastebin.com/TxLV0rDr
>
> As of yet I am unfamiliar with the streaming k-means code and the
> algorithms behind it. If anyone has suggestion on what goes wrong in the
> code I am I happy to help where I can.
>
>
> Frank
>
>
>
> On Sun, Jan 19, 2014 at 10:55 AM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> Thanks Grant.
> >
> >Not sure if I can vote given my role as the BuildMeister/ReleaseMeister
> for 0.9.
> >Here's my +1 FWIW.
> >
> >a) Attached is the draft of the Release notes for 0.9, would definitely
> appreciate feedback on that.
> >
> >b) The vote is open until Monday, Jan 20, 2014 11:59PM EST and passes if
> a majority of atleast 3 +1 PMC votes are cast.
> >
> >The release files, including signatures, digests, etc can be found at:
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> >
> >The staging repository for this release can be found at:
> >https://repository.apache.org/content/repositories/orgapachemahout-1002
> >
> >Release artifacts have been signed with the following key:
> >https://people.apache.org/keys/committer/smarthi.asc
> >
> >
> >
> >
> >
> >
> >
> >
> >On Saturday, January 18, 2014 12:27 PM, Grant Ingersoll <
> gsingers@apache.org> wrote:
> >
> >Ran the tests, verified sigs, tried out a few of the examples.
> >
> >+1 (binding)
> >
> >
> >On Jan 16, 2014, at 9:41 AM, Suneel Marthi <su...@yahoo.com>
> wrote:
> >
> >> Third time's a Charm!!!
> >>
> >>
> >> Here's the new URL for Mahout 0.9 Release:
> >>
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> >>
> >> For those volunteering to test this, some of the things to be verified:
> >>
> >> a) Verify that u can
> unpack the release (tar or zip)
> >> b) Verify u r able to compile the distro
> >> c) Run through the unit tests: mvn clean test
> >> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> through all the different options in each script.
> >>
> >>
> >> Committers
> >> and PMC members:
> >> ---------------------------------------
> >>
> >> Need 'at least 3 +1 votes' for the Release to pass.
> >>
> >>
> >> Thanks and Regards.
> >
> >
> >
> >
>
Re: MAHOUT 0.9 Release - New URL
Posted by Frank Scholten <fr...@frankscholten.nl>.
Exported MAHOUT_LOCAL=true and still get the same results.
On Sun, Jan 19, 2014 at 5:00 PM, Suneel Marthi <su...@yahoo.com>wrote:
> Frank,
>
> Were u running this with MAHOUT_LOCAL=true?
>
>
>
>
>
> On Sunday, January 19, 2014 10:29 AM, Frank Scholten <
> frank@frankscholten.nl> wrote:
>
> -1
>
> The cluster reuters example results in zero clusters when choosing
> streaming k-means. The other steps, unpacking and building do work.
>
> I see this stacktrace:
>
> INFO: Number of Centroids: 0
> Jan 19, 2014 3:51:08 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
> WARNING: job_local797072544_0001
> java.lang.IllegalArgumentException: Must have nonzero number of training
> and test vectors. Asked for %.1f %% of %d vectors for test
> [10.000000149011612, 0]
> at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
> at
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
> at
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
> at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
> at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
> at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
> at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
>
> Num clusters: 0; maxDistance: 0.000000
> [Dunn Index] First: Infinity
> [Davies-Bouldin Index] First: NaN
> Jan 19, 2014 3:51:09 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 278 ms (Minutes: 0.004633333333333333)
> cluster,distance.mean,distance.sd
> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>
>
> Here is the full log: http://pastebin.com/TxLV0rDr
>
> As of yet I am unfamiliar with the streaming k-means code and the
> algorithms behind it. If anyone has suggestion on what goes wrong in the
> code I am I happy to help where I can.
>
>
> Frank
>
>
>
> On Sun, Jan 19, 2014 at 10:55 AM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> Thanks Grant.
> >
> >Not sure if I can vote given my role as the BuildMeister/ReleaseMeister
> for 0.9.
> >Here's my +1 FWIW.
> >
> >a) Attached is the draft of the Release notes for 0.9, would definitely
> appreciate feedback on that.
> >
> >b) The vote is open until Monday, Jan 20, 2014 11:59PM EST and passes if
> a majority of atleast 3 +1 PMC votes are cast.
> >
> >The release files, including signatures, digests, etc can be found at:
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> >
> >The staging repository for this release can be found at:
> >https://repository.apache.org/content/repositories/orgapachemahout-1002
> >
> >Release artifacts have been signed with the following key:
> >https://people.apache.org/keys/committer/smarthi.asc
> >
> >
> >
> >
> >
> >
> >
> >
> >On Saturday, January 18, 2014 12:27 PM, Grant Ingersoll <
> gsingers@apache.org> wrote:
> >
> >Ran the tests, verified sigs, tried out a few of the examples.
> >
> >+1 (binding)
> >
> >
> >On Jan 16, 2014, at 9:41 AM, Suneel Marthi <su...@yahoo.com>
> wrote:
> >
> >> Third time's a Charm!!!
> >>
> >>
> >> Here's the new URL for Mahout 0.9 Release:
> >>
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> >>
> >> For those volunteering to test this, some of the things to be verified:
> >>
> >> a) Verify that u can
> unpack the release (tar or zip)
> >> b) Verify u r able to compile the distro
> >> c) Run through the unit tests: mvn clean test
> >> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> through all the different options in each script.
> >>
> >>
> >> Committers
> >> and PMC members:
> >> ---------------------------------------
> >>
> >> Need 'at least 3 +1 votes' for the Release to pass.
> >>
> >>
> >> Thanks and Regards.
> >
> >
> >
> >
>
Re: MAHOUT 0.9 Release - New URL
Posted by Suneel Marthi <su...@yahoo.com>.
Frank,
Were u running this with MAHOUT_LOCAL=true?
On Sunday, January 19, 2014 10:29 AM, Frank Scholten <fr...@frankscholten.nl> wrote:
-1
The cluster reuters example results in zero clusters when choosing streaming k-means. The other steps, unpacking and building do work.
I see this stacktrace:
INFO: Number of Centroids: 0
Jan 19, 2014 3:51:08 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
WARNING: job_local797072544_0001
java.lang.IllegalArgumentException: Must have nonzero number of training and test vectors. Asked for %.1f %% of %d vectors for test [10.000000149011612, 0]
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
at org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
at org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
Num clusters: 0; maxDistance: 0.000000
[Dunn Index] First: Infinity
[Davies-Bouldin Index] First: NaN
Jan 19, 2014 3:51:09 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 278 ms (Minutes: 0.004633333333333333)
cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
Here is the full log: http://pastebin.com/TxLV0rDr
As of yet I am unfamiliar with the streaming k-means code and the algorithms behind it. If anyone has suggestion on what goes wrong in the code I am I happy to help where I can.
Frank
On Sun, Jan 19, 2014 at 10:55 AM, Suneel Marthi <su...@yahoo.com> wrote:
Thanks Grant.
>
>Not sure if I can vote given my role as the BuildMeister/ReleaseMeister for 0.9.
>Here's my +1 FWIW.
>
>a) Attached is the draft of the Release notes for 0.9, would definitely appreciate feedback on that.
>
>b) The vote is open until Monday, Jan 20, 2014 11:59PM EST and passes if a majority of atleast 3 +1 PMC votes are cast.
>
>The release files, including signatures, digests, etc can be found at:
>https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>
>The staging repository for this release can be found at:
>https://repository.apache.org/content/repositories/orgapachemahout-1002
>
>Release artifacts have been signed with the following key:
>https://people.apache.org/keys/committer/smarthi.asc
>
>
>
>
>
>
>
>
>On Saturday, January 18, 2014 12:27 PM, Grant Ingersoll <gs...@apache.org> wrote:
>
>Ran the tests, verified sigs, tried out a few of the examples.
>
>+1 (binding)
>
>
>On Jan 16, 2014, at 9:41 AM, Suneel Marthi <su...@yahoo.com> wrote:
>
>> Third time's a Charm!!!
>>
>>
>> Here's the new URL for Mahout 0.9 Release:
>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>>
>> For those volunteering to test this, some of the things to be verified:
>>
>> a) Verify that u can
unpack the release (tar or zip)
>> b) Verify u r able to compile the distro
>> c) Run through the unit tests: mvn clean test
>> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
>>
>>
>> Committers
>> and PMC members:
>> ---------------------------------------
>>
>> Need 'at least 3 +1 votes' for the Release to pass.
>>
>>
>> Thanks and Regards.
>
>
>
>
Re: MAHOUT 0.9 Release - New URL
Posted by Frank Scholten <fr...@frankscholten.nl>.
-1
The cluster reuters example results in zero clusters when choosing
streaming k-means. The other steps, unpacking and building do work.
I see this stacktrace:
INFO: Number of Centroids: 0
Jan 19, 2014 3:51:08 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
WARNING: job_local797072544_0001
java.lang.IllegalArgumentException: Must have nonzero number of training
and test vectors. Asked for %.1f %% of %d vectors for test
[10.000000149011612, 0]
at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
at
org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
at
org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
at
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
at
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
at
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
Num clusters: 0; maxDistance: 0.000000
[Dunn Index] First: Infinity
[Davies-Bouldin Index] First: NaN
Jan 19, 2014 3:51:09 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 278 ms (Minutes: 0.004633333333333333)
cluster,distance.mean,distance.sd
,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
Here is the full log: http://pastebin.com/TxLV0rDr
As of yet I am unfamiliar with the streaming k-means code and the
algorithms behind it. If anyone has suggestion on what goes wrong in the
code I am I happy to help where I can.
Frank
On Sun, Jan 19, 2014 at 10:55 AM, Suneel Marthi <su...@yahoo.com>wrote:
> Thanks Grant.
>
> Not sure if I can vote given my role as the BuildMeister/ReleaseMeister
> for 0.9.
> Here's my +1 FWIW.
>
> a) Attached is the draft of the Release notes for 0.9, would definitely
> appreciate feedback on that.
>
> b) The vote is open until Monday, Jan 20, 2014 11:59PM EST and passes if a
> majority of atleast 3 +1 PMC votes are cast.
>
> The release files, including signatures, digests, etc can be found at:
>
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachemahout-1002<https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/>
>
> Release artifacts have been signed with the following key:
> https://people.apache.org/keys/committer/smarthi.asc<https://people.apache.org/keys/committer/pwendell.asc>
>
>
>
>
>
>
> On Saturday, January 18, 2014 12:27 PM, Grant Ingersoll <
> gsingers@apache.org> wrote:
> Ran the tests, verified sigs, tried out a few of the examples.
>
> +1 (binding)
>
> On Jan 16, 2014, at 9:41 AM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> > Third time's a Charm!!!
> >
> >
> > Here's the new URL for Mahout 0.9 Release:
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> >
> > For those volunteering to test this, some of the things to be verified:
> >
> > a) Verify that u can unpack the release (tar or zip)
> > b) Verify u r able to compile the distro
> > c) Run through the unit tests: mvn clean test
> > d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> through all the different options in each script.
> >
> >
> > Committers
> > and PMC members:
> > ---------------------------------------
> >
> > Need 'at least 3 +1 votes' for the Release to pass.
> >
> >
> > Thanks and Regards.
>
>
>
>
Re: MAHOUT 0.9 Release - New URL
Posted by Suneel Marthi <su...@yahoo.com>.
Thanks Grant.
Not sure if I can vote given my role as the BuildMeister/ReleaseMeister for 0.9.
Here's my +1 FWIW.
a) Attached is the draft of the Release notes for 0.9, would definitely appreciate feedback on that.
b) The vote is open until Monday, Jan 20, 2014 11:59PM EST and passes if a majority of atleast 3 +1 PMC votes are cast.
The release files, including signatures, digests, etc can be found at:
https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachemahout-1002
Release artifacts have been signed with the following key:
https://people.apache.org/keys/committer/smarthi.asc
On Saturday, January 18, 2014 12:27 PM, Grant Ingersoll <gs...@apache.org> wrote:
Ran the tests, verified sigs, tried out a few of the examples.
+1 (binding)
On Jan 16, 2014, at 9:41 AM, Suneel Marthi <su...@yahoo.com> wrote:
> Third time's a Charm!!!
>
>
> Here's the new URL for Mahout 0.9 Release:
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>
> For those volunteering to test this, some of the things to be verified:
>
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c) Run through the unit tests: mvn clean test
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
>
>
> Committers
> and PMC members:
> ---------------------------------------
>
> Need 'at least 3 +1 votes' for the Release to pass.
>
>
> Thanks and Regards.
Re: MAHOUT 0.9 Release - New URL
Posted by Grant Ingersoll <gs...@apache.org>.
Ran the tests, verified sigs, tried out a few of the examples.
+1 (binding)
On Jan 16, 2014, at 9:41 AM, Suneel Marthi <su...@yahoo.com> wrote:
> Third time's a Charm!!!
>
>
> Here's the new URL for Mahout 0.9 Release:
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>
> For those volunteering to test this, some of the things to be verified:
>
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c) Run through the unit tests: mvn clean test
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
>
>
> Committers
> and PMC members:
> ---------------------------------------
>
> Need 'at least 3 +1 votes' for the Release to pass.
>
>
> Thanks and Regards.
Re: MAHOUT 0.9 Release - New URL
Posted by Shannon Quinn <sq...@gatech.edu>.
OS X 10.9.1, java version 1.6.0_65.
On 1/16/14, 10:41 AM, Sergey Svinarchuk wrote:
> I tested mahout 0.9 on Ubuntu 12.04 64bit, java version "1.6.0_27"
>
> a) Verify that u can unpack the release (tar or zip) - passed
> b) Verify u r able to compile the distro - passed
> c) Run through the unit tests: mvn clean test -passed
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> through all the different options in each script. - will update later
>
>
> On Thu, Jan 16, 2014 at 5:35 PM, Sotiris Salloumis <in...@eprice.gr> wrote:
>
>> Hi Suneel,
>>
>> Below first round of tests,
>>
>> Environment: SMP Debian 3.2.51-1 x86_64
>> Machine: Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz stepping 05 12GB
>> RAM
>> OpenJDK: javac 1.6.0_27
>>
>> a) Verify that u can unpack the release (tar or zip) [ Passed: tar -zxvf ]
>> b) Verify u r able to compile the distro [ Passed: With OpenJDK, Latest
>> Maven on LatestDebian ]
>> c) Run through the unit tests: mvn clean test [ Passed: 370 milliseconds]
>>
>> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
>> through all the different options in each script. [Ongoing will update
>> later]
>>
>> Regards
>> Sotiris
>>
>> -----Original Message-----
>> From: Suneel Marthi [mailto:suneel_marthi@yahoo.com]
>> Sent: Thursday, January 16, 2014 4:41 PM
>> To: user@mahout.apache.org; mahout
>> Subject: MAHOUT 0.9 Release - New URL
>>
>> Third time's a Charm!!!
>>
>>
>> Here's the new URL for Mahout 0.9 Release:
>>
>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/
>> apache/mahout/mahout-distribution/0.9/
>>
>> For those volunteering to test this, some of the things to be verified:
>>
>> a) Verify that u can unpack the release (tar or zip)
>> b) Verify u r able to compile the distro
>> c) Run through the unit tests: mvn clean test
>> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
>> through all the different options in each script.
>>
>>
>> Committers
>> and PMC members:
>> ---------------------------------------
>>
>> Need 'at least 3 +1 votes' for the Release to pass.
>>
>>
>> Thanks and Regards.
>>
>>
Re: MAHOUT 0.9 Release - New URL
Posted by Sergey Svinarchuk <ss...@hortonworks.com>.
I tested mahout 0.9 on Ubuntu 12.04 64bit, java version "1.6.0_27"
a) Verify that u can unpack the release (tar or zip) - passed
b) Verify u r able to compile the distro - passed
c) Run through the unit tests: mvn clean test -passed
d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
through all the different options in each script. - will update later
On Thu, Jan 16, 2014 at 5:35 PM, Sotiris Salloumis <in...@eprice.gr> wrote:
> Hi Suneel,
>
> Below first round of tests,
>
> Environment: SMP Debian 3.2.51-1 x86_64
> Machine: Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz stepping 05 12GB
> RAM
> OpenJDK: javac 1.6.0_27
>
> a) Verify that u can unpack the release (tar or zip) [ Passed: tar -zxvf ]
> b) Verify u r able to compile the distro [ Passed: With OpenJDK, Latest
> Maven on LatestDebian ]
> c) Run through the unit tests: mvn clean test [ Passed: 370 milliseconds]
>
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> through all the different options in each script. [Ongoing will update
> later]
>
> Regards
> Sotiris
>
> -----Original Message-----
> From: Suneel Marthi [mailto:suneel_marthi@yahoo.com]
> Sent: Thursday, January 16, 2014 4:41 PM
> To: user@mahout.apache.org; mahout
> Subject: MAHOUT 0.9 Release - New URL
>
> Third time's a Charm!!!
>
>
> Here's the new URL for Mahout 0.9 Release:
>
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/
> apache/mahout/mahout-distribution/0.9/
>
> For those volunteering to test this, some of the things to be verified:
>
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c) Run through the unit tests: mvn clean test
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> through all the different options in each script.
>
>
> Committers
> and PMC members:
> ---------------------------------------
>
> Need 'at least 3 +1 votes' for the Release to pass.
>
>
> Thanks and Regards.
>
>
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
RE: MAHOUT 0.9 Release - New URL
Posted by Sotiris Salloumis <in...@eprice.gr>.
Sorry my mistake milliseconds was the last test … below the full results
~/mahout/apache-maven-3.1.1/bin/mvn -DskipTests clean install
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 52.312s
[INFO] Finished at: Sat Jan 18 02:04:29 CET 2014
[INFO] Final Memory: 46M/305M
[INFO] ------------------------------------------------------------------------
~/mahout/apache-maven-3.1.1/bin/mvn clean test
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Mahout Build Tools ................................ SUCCESS [1.166s]
[INFO] Apache Mahout ..................................... SUCCESS [0.264s]
[INFO] Mahout Math ....................................... SUCCESS [58.639s]
[INFO] Mahout Core ....................................... SUCCESS [4:01.640s]
[INFO] Mahout Integration ................................ SUCCESS [21.481s]
[INFO] Mahout Examples ................................... SUCCESS [1.980s]
[INFO] Mahout Release Package ............................ SUCCESS [0.003s]
[INFO] Mahout Math/Scala wrappers ........................ SUCCESS [14.149s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 5:39.563s
[INFO] Finished at: Sat Jan 18 02:10:53 CET 2014
[INFO] Final Memory: 51M/1068M
[INFO] ------------------------------------------------------------------------
From: Ted Dunning [mailto:ted.dunning@gmail.com]
Sent: Saturday, January 18, 2014 2:50 AM
To: Mahout Dev List; Sotiris Salloumis
Cc: Suneel Marthi; user@mahout.apache.org
Subject: Re: MAHOUT 0.9 Release - New URL
On Thu, Jan 16, 2014 at 7:35 AM, Sotiris Salloumis <info@eprice.gr <ma...@eprice.gr> > wrote:
c) Run through the unit tests: mvn clean test [ Passed: 370 milliseconds]
?!
Was that seconds? Or really milliseconds?
Re: MAHOUT 0.9 Release - New URL
Posted by Ted Dunning <te...@gmail.com>.
On Thu, Jan 16, 2014 at 7:35 AM, Sotiris Salloumis <in...@eprice.gr> wrote:
> c) Run through the unit tests: mvn clean test [ Passed: 370 milliseconds]
>
?!
Was that seconds? Or really milliseconds?
Re: MAHOUT 0.9 Release - New URL
Posted by Sergey Svinarchuk <ss...@hortonworks.com>.
I tested mahout 0.9 on Ubuntu 12.04 64bit, java version "1.6.0_27"
a) Verify that u can unpack the release (tar or zip) - passed
b) Verify u r able to compile the distro - passed
c) Run through the unit tests: mvn clean test -passed
d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
through all the different options in each script. - will update later
On Thu, Jan 16, 2014 at 5:35 PM, Sotiris Salloumis <in...@eprice.gr> wrote:
> Hi Suneel,
>
> Below first round of tests,
>
> Environment: SMP Debian 3.2.51-1 x86_64
> Machine: Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz stepping 05 12GB
> RAM
> OpenJDK: javac 1.6.0_27
>
> a) Verify that u can unpack the release (tar or zip) [ Passed: tar -zxvf ]
> b) Verify u r able to compile the distro [ Passed: With OpenJDK, Latest
> Maven on LatestDebian ]
> c) Run through the unit tests: mvn clean test [ Passed: 370 milliseconds]
>
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> through all the different options in each script. [Ongoing will update
> later]
>
> Regards
> Sotiris
>
> -----Original Message-----
> From: Suneel Marthi [mailto:suneel_marthi@yahoo.com]
> Sent: Thursday, January 16, 2014 4:41 PM
> To: user@mahout.apache.org; mahout
> Subject: MAHOUT 0.9 Release - New URL
>
> Third time's a Charm!!!
>
>
> Here's the new URL for Mahout 0.9 Release:
>
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/
> apache/mahout/mahout-distribution/0.9/
>
> For those volunteering to test this, some of the things to be verified:
>
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c) Run through the unit tests: mvn clean test
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> through all the different options in each script.
>
>
> Committers
> and PMC members:
> ---------------------------------------
>
> Need 'at least 3 +1 votes' for the Release to pass.
>
>
> Thanks and Regards.
>
>
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
Re: MAHOUT 0.9 Release - New URL
Posted by Ted Dunning <te...@gmail.com>.
On Thu, Jan 16, 2014 at 7:35 AM, Sotiris Salloumis <in...@eprice.gr> wrote:
> c) Run through the unit tests: mvn clean test [ Passed: 370 milliseconds]
>
?!
Was that seconds? Or really milliseconds?
RE: MAHOUT 0.9 Release - New URL
Posted by Sotiris Salloumis <in...@eprice.gr>.
Hi Suneel,
Below first round of tests,
Environment: SMP Debian 3.2.51-1 x86_64
Machine: Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz stepping 05 12GB
RAM
OpenJDK: javac 1.6.0_27
a) Verify that u can unpack the release (tar or zip) [ Passed: tar -zxvf ]
b) Verify u r able to compile the distro [ Passed: With OpenJDK, Latest
Maven on LatestDebian ]
c) Run through the unit tests: mvn clean test [ Passed: 370 milliseconds]
d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
through all the different options in each script. [Ongoing will update
later]
Regards
Sotiris
-----Original Message-----
From: Suneel Marthi [mailto:suneel_marthi@yahoo.com]
Sent: Thursday, January 16, 2014 4:41 PM
To: user@mahout.apache.org; mahout
Subject: MAHOUT 0.9 Release - New URL
Third time's a Charm!!!
Here's the new URL for Mahout 0.9 Release:
https://repository.apache.org/content/repositories/orgapachemahout-1002/org/
apache/mahout/mahout-distribution/0.9/
For those volunteering to test this, some of the things to be verified:
a) Verify that u can unpack the release (tar or zip)
b) Verify u r able to compile the distro
c) Run through the unit tests: mvn clean test
d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
through all the different options in each script.
Committers
and PMC members:
---------------------------------------
Need 'at least 3 +1 votes' for the Release to pass.
Thanks and Regards.
Re: MAHOUT 0.9 Release - New URL
Posted by Suneel Marthi <su...@yahoo.com>.
Hmmm... that's an issue. Since both Dirichlet and Meanshift clustering have been removed from 0.9, cluster-syntheticcontrol.sh options 4,5 are not gonna work and should have been removed for 0.9.
To PMC,
-> rollback the release, fix this issue (and other patches that were submitted in the last few days) and put out another release ?
On Monday, January 20, 2014 12:33 AM, Andrew Palumbo <ap...@outlook.com> wrote:
I ran through the tests with on a CentOS VM AMD64 2 cores 4 GB RAM. Had a bit of trouble getting the Hadoop natives to compile and therefore may have run into some problems because of the hadoop setup. Ran into some problems in the example scripts. Particularly with ./cluster-syntheticcontrol.sh ->4,5. I will run through the rest of the examples when im sure I've got hadoop setup right.
Apache Maven 3.1.2-SNAPSHOT
Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
Java home: /usr/java/jdk1.6.0_45/jre
OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64", family: "unix"
$MAHOUT_LOCAL=true
Hadoop 2.2.0
a) Verify that u can unpack the release (tar or zip) ...passed (tar) [passed ]
b) Verify u r able to compile the distro
mvn compile- [passed with warnings]
[WARNING] Expected all dependencies to require Scala version: 2.9.3
[WARNING] org.apache.mahout:mahout-math-scala:0.9 requires scala version: 2.9.3
[WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala version: 2.9.2
[WARNING] Multiple versions of scala libraries detected!
c) Run through the unit tests: mvn clean test
mvn clean test [passed]
d) Run the example scripts under $MAHOUT_HOME/examples/bin.
Please run through all the different options in each script
Running example scripts with $MAHOUT_LOCAL=true
./cluster-syntheticcontrol.sh ->1 [works]
./cluster-syntheticcontrol.sh ->2 [works]
./cluster-syntheticcontrol.sh ->3 [works]
./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
[...]
WARNING: Unable to add class: org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
java.lang.ClassNotFoundException: org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:171)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
WARNING: Unable to add class: org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
java.lang.ClassNotFoundException: org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:171)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
WARNING: No org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found on classpath, will use command-line arguments only
Unknown program 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.
./classify-20newsgroups.sh ->1 [works]
./classify-20newsgroups.sh ->2 [works]
cluster-reuters.sh ->1 [works]
cluster-reuters.sh ->2 [works]
cluster-reuters.sh ->3 [works]
Same error as noted previosly in the thread:
cluster-reuters.sh ->4 [0 clusters]
[...]
WARNING: No qualcluster.props found on classpath, will use command-line arguments only
Num clusters: 0; maxDistance: 0.000000
[Dunn Index] First: Infinity
[Davies-Bouldin Index] First: NaN
Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 669 ms (Minutes: 0.01115)
cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> Date: Thu, 16 Jan 2014 06:41:09 -0800
> From: suneel_marthi@yahoo.com
> Subject: MAHOUT 0.9 Release - New URL
> To: user@mahout.apache.org; dev@mahout.apache.org
>
> Third time's a Charm!!!
>
>
> Here's the new URL for Mahout 0.9 Release:
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>
> For those volunteering to test this, some of the things to be verified:
>
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c) Run through the unit tests: mvn clean test
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
>
>
> Committers
> and PMC members:
> ---------------------------------------
>
> Need 'at least 3 +1 votes' for the Release to pass.
>
>
> Thanks and Regards.
RE: MAHOUT 0.9 Release - New URL
Posted by Andrew Palumbo <ap...@outlook.com>.
I checked out the latest code. Compiled and tested on a CentOS 6 AMD64 VM without incident.
$MAHOUT_LOCAL=true
classify-20newsgroups.sh->1
classify-20newsgroups.sh->2
classify-20newsgroups.sh->3
cluster-reuters.sh->1
cluster-reuters.sh->2
cluster-reuters.sh->3
cluster-reuters.sh->4 [0 clusters]
cluster-syntheticcontrol.sh->1
cluster-syntheticcontrol.sh->2
cluster-syntheticcontrol.sh->3
factorize-movielens-1M.sh /home/andy/test_data/ml-1m/ratings.dat
All ran without incident.
I dont have the netflix dataset so can't test that.
> Date: Tue, 21 Jan 2014 13:47:27 -0800
> Subject: Re: MAHOUT 0.9 Release - New URL
> From: andrew.musselman@gmail.com
> To: dev@mahout.apache.org
>
> *classify-20newsgroups.sh*
>
> *Complementary naive bayes:*
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances : 11207 98.9406%
> Incorrectly Classified Instances : 120 1.0594%
> Total Classified Instances : 11327
>
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a b c d e f g h i j
> k l m n o p q r s
> t <--Classified as
> 475 0 0 1 0 0 0 0 0 0
> 0 0 0 0 1 0 1 0 0
> 0 | 478 a = alt.atheism
> 0 597 1 1 0 1 1 0 0 0
> 0 1 0 2 1 0 0 0 0
> 0 | 605 b = comp.graphics
> 0 1 620 3 0 1 0 0 0 0
> 0 1 0 0 1 0 0 0 0
> 0 | 627 c = comp.os.ms-windows.misc
> 1 1 1 593 2 0 0 0 0 0
> 0 0 0 0 0 1 0 0 0
> 0 | 599 d = comp.sys.ibm.pc.hardware
> 0 1 1 0 568 0 1 0 0 0
> 1 1 2 0 0 0 0 1 0
> 0 | 576 e = comp.sys.mac.hardware
> 0 4 2 0 0 581 0 0 0 0
> 0 0 0 0 0 0 0 0 0
> 0 | 587 f = comp.windows.x
> 0 0 0 1 2 0 571 3 0 0
> 1 1 4 1 0 0 0 0 0
> 0 | 584 g = misc.forsale
> 0 0 0 1 0 0 0 589 1 0
> 0 1 1 0 0 0 0 0 0
> 0 | 593 h = rec.autos
> 0 0 0 0 0 0 0 1 565 0
> 0 0 0 0 1 0 0 0 0
> 0 | 567 i = rec.motorcycles
> 0 0 0 0 0 0 0 0 0 600
> 2 0 0 0 1 0 0 0 0
> 0 | 603 j = rec.sport.baseball
> 0 0 0 0 0 0 0 0 0 1
> 584 0 0 0 0 0 0 0 0
> 0 | 585 k = rec.sport.hockey
> 0 0 0 0 0 0 0 0 0 0
> 0 579 0 0 0 0 0 1 0
> 0 | 580 l = sci.crypt
> 0 0 0 1 3 0 2 0 0 2
> 0 0 567 1 2 1 0 0 0
> 0 | 579 m = sci.electronics
> 0 0 0 0 0 0 0 0 0 0
> 0 0 1 605 0 0 0 0 0
> 0 | 606 n = sci.med
> 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 602 0 0 0 0
> 0 | 602 o = sci.space
> 0 0 0 0 0 0 0 0 0 0
> 0 0 0 1 0 602 0 0 1
> 0 | 604 p = soc.religion.christian
> 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 556 0 0
> 0 | 556 q = talk.politics.mideast
> 0 0 1 0 0 0 0 0 0 0
> 0 1 0 0 1 0 0 568 0
> 0 | 571 r = talk.politics.guns
> 11 0 0 0 0 0 0 0 0 1
> 0 0 0 1 3 8 1 4 338
> 2 | 369 s = talk.religion.misc
> 0 0 0 0 0 0 0 0 0 0
> 1 0 0 0 1 0 3 4 0
> 447 | 456 t = talk.politics.misc
>
> =======================================================
> Statistics
> -------------------------------------------------------
> Kappa 0.9806
> Accuracy 98.9406%
> Reliability 94.0932%
> Reliability (standard deviation) 0.2163
>
> Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 15870 ms (Minutes: 0.2645)
> + echo 'Testing on holdout set'
> Testing on holdout set
> + ./bin/mahout testnb -i /tmp/mahout-work-ec2-user/20news-test-vectors -m
> /tmp/mahout-work-ec2-user/model -l /tmp/mahout-work-ec2-user/labelindex -ow
> -o /tmp/mahout-work-ec2-user/20news-testing -c
>
> [snip]
>
> INFO: Complementary Results:
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances : 6715 89.3071%
> Incorrectly Classified Instances : 804 10.6929%
> Total Classified Instances : 7519
>
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a b c d e f g h i j
> k l m n o p q r s
> t <--Classified as
> 298 0 0 0 0 0 0 0 0 1
> 0 0 0 1 2 5 1 0 13
> 0 | 321 a = alt.atheism
> 0 298 11 6 1 12 2 2 1 1
> 3 8 3 4 2 4 1 4 4
> 1 | 368 b = comp.graphics
> 1 17 286 16 4 9 6 3 2 0
> 1 0 1 7 1 0 2 1 0
> 1 | 358 c = comp.os.ms-windows.misc
> 2 6 11 309 9 5 14 8 1 0
> 2 0 6 4 2 0 1 2 1
> 0 | 383 d = comp.sys.ibm.pc.hardware
> 0 10 8 7 334 7 5 5 2 0
> 3 0 2 1 1 0 1 1 0
> 0 | 387 e = comp.sys.mac.hardware
> 1 13 7 8 2 355 2 0 2 0
> 0 5 1 1 3 0 0 1 0
> 0 | 401 f = comp.windows.x
> 0 7 11 29 12 9 268 16 8 4
> 3 2 6 4 2 1 3 1 2
> 3 | 391 g = misc.forsale
> 0 1 0 0 3 0 7 362 8 2
> 2 1 2 0 2 0 1 2 0
> 4 | 397 h = rec.autos
> 0 0 0 1 0 0 1 0 423 0
> 0 0 2 1 0 1 0 0 0
> 0 | 429 i = rec.motorcycles
> 0 0 1 0 0 0 0 2 2 371
> 8 0 2 3 0 2 0 0 0
> 0 | 391 j = rec.sport.baseball
> 0 0 1 0 0 0 1 0 0 2
> 409 0 0 0 0 0 0 0 0
> 1 | 414 k = rec.sport.hockey
> 0 0 1 2 1 0 1 0 0 0
> 0 404 0 0 0 0 0 1 0
> 1 | 411 l = sci.crypt
> 0 5 4 11 1 3 7 9 2 5
> 3 3 339 2 6 0 1 1 2
> 1 | 405 m = sci.electronics
> 0 4 0 1 0 0 0 1 0 1
> 1 0 3 367 3 1 2 0 0
> 0 | 384 n = sci.med
> 0 1 2 0 1 0 2 0 0 1
> 0 0 1 1 375 0 1 0 0
> 0 | 385 o = sci.space
> 4 2 1 1 0 0 1 1 2 0
> 0 1 1 5 1 367 4 0 1
> 1 | 393 p = soc.religion.christian
> 0 1 0 0 0 0 0 0 0 2
> 0 0 0 0 0 2 378 0 1
> 0 | 384 q = talk.politics.mideast
> 0 0 0 0 0 2 1 1 1 1
> 0 3 0 3 0 0 2 319 2
> 4 | 339 r = talk.politics.guns
> 32 0 0 1 0 0 0 0 0 1
> 1 1 0 2 2 26 5 7 175
> 6 | 259 s = talk.religion.misc
> 0 0 0 2 0 0 0 0 0 1
> 2 2 0 1 2 1 10 18 2
> 278 | 319 t = talk.politics.misc
>
> =======================================================
> Statistics
> -------------------------------------------------------
> Kappa 0.8594
> Accuracy 89.3071%
> Reliability 84.611%
> Reliability (standard deviation) 0.2148
>
> Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 10840 ms (Minutes: 0.18066666666666667)
>
>
> *Naive bayes:*
> INFO: Standard NB Results:
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances : 11286 99.0869%
> Incorrectly Classified Instances : 104 0.9131%
> Total Classified Instances : 11390
>
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a b c d e f g h i j
> k l m n o p q r s
> t <--Classified as
> 474 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 2
> 1 | 477 a = alt.atheism
> 0 566 0 2 0 1 0 0 0 0
> 0 0 0 0 0 0 0 0 0
> 0 | 569 b = comp.graphics
> 0 10 590 29 2 4 1 0 0 0
> 0 0 1 0 0 0 0 0 0
> 1 | 638 c = comp.os.ms-windows.misc
> 0 0 0 596 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0
> 0 | 596 d = comp.sys.ibm.pc.hardware
> 0 0 0 0 575 0 1 0 0 0
> 0 0 1 0 0 0 0 0 0
> 0 | 577 e = comp.sys.mac.hardware
> 0 2 2 2 0 593 1 0 0 0
> 0 0 0 0 1 0 0 0 0
> 0 | 601 f = comp.windows.x
> 0 0 0 1 0 0 589 1 0 0
> 1 0 2 0 0 0 0 0 0
> 0 | 594 g = misc.forsale
> 0 0 0 0 0 0 0 594 0 0
> 0 0 0 0 0 0 0 0 0
> 0 | 594 h = rec.autos
> 0 0 0 0 0 0 0 0 611 0
> 0 0 0 0 0 0 0 0 0
> 0 | 611 i = rec.motorcycles
> 0 0 0 0 0 0 0 0 0 616
> 1 0 0 0 0 0 0 0 0
> 0 | 617 j = rec.sport.baseball
> 0 0 0 0 0 0 1 0 0 0
> 620 0 0 0 0 0 0 0 0
> 0 | 621 k = rec.sport.hockey
> 0 0 0 0 0 0 0 0 0 0
> 0 580 0 0 0 0 0 1 0
> 0 | 581 l = sci.crypt
> 0 0 0 3 1 0 0 0 0 0
> 0 0 571 0 0 0 0 0 0
> 0 | 575 m = sci.electronics
> 0 0 0 0 0 0 0 0 0 0
> 0 0 2 583 0 0 0 0 0
> 0 | 585 n = sci.med
> 0 0 0 0 0 0 0 0 0 0
> 0 0 0 1 599 0 0 0 0
> 0 | 600 o = sci.space
> 0 1 0 0 0 0 0 0 0 0
> 0 0 0 0 0 615 0 0 0
> 0 | 616 p = soc.religion.christian
> 1 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 1 560 0 0
> 0 | 562 q = talk.politics.mideast
> 0 0 1 0 0 0 0 0 0 0
> 0 1 0 0 0 0 0 548 0
> 1 | 551 r = talk.politics.guns
> 10 0 0 0 0 0 0 0 0 0
> 0 0 0 0 1 1 0 2 344
> 1 | 359 s = talk.religion.misc
> 0 0 0 0 0 0 0 0 0 0
> 0 1 1 0 0 0 0 2 0
> 462 | 466 t = talk.politics.misc
>
> =======================================================
> Statistics
> -------------------------------------------------------
> Kappa 0.9847
> Accuracy 99.0869%
> Reliability 94.3334%
> Reliability (standard deviation) 0.2169
>
> Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 14304 ms (Minutes: 0.2384)
> + echo 'Testing on holdout set'
> Testing on holdout set
>
> [snip]
>
> INFO: Standard NB Results:
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances : 6718 90.1019%
> Incorrectly Classified Instances : 738 9.8981%
> Total Classified Instances : 7456
>
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a b c d e f g h i j
> k l m n o p q r s
> t <--Classified as
> 294 0 0 0 0 0 0 0 0 0
> 0 2 0 1 1 6 1 1 16
> 0 | 322 a = alt.atheism
> 0 345 6 14 6 11 6 0 0 0
> 0 5 7 1 3 0 0 0 0
> 0 | 404 b = comp.graphics
> 2 29 177 78 22 19 9 1 0 0
> 0 4 2 0 1 1 0 0 1
> 1 | 347 c = comp.os.ms-windows.misc
> 1 9 2 335 18 2 10 0 0 0
> 1 0 8 0 0 0 0 0 0
> 0 | 386 d = comp.sys.ibm.pc.hardware
> 1 4 2 13 347 3 5 1 0 0
> 1 0 7 1 0 0 0 1 0
> 0 | 386 e = comp.sys.mac.hardware
> 0 20 0 4 0 352 4 0 0 0
> 0 0 1 1 3 0 1 0 1
> 0 | 387 f = comp.windows.x
> 0 2 0 21 5 1 323 7 2 2
> 0 2 12 0 3 0 0 0 0
> 1 | 381 g = misc.forsale
> 0 1 0 0 1 0 15 363 8 1
> 0 0 4 1 0 0 0 1 0
> 1 | 396 h = rec.autos
> 0 1 0 0 0 0 6 6 370 0
> 0 0 0 1 0 0 0 0 1
> 0 | 385 i = rec.motorcycles
> 1 0 0 1 1 0 2 1 2 362
> 5 0 2 0 0 0 0 0 0
> 0 | 377 j = rec.sport.baseball
> 0 0 0 1 2 0 0 0 0 3
> 371 0 0 0 0 0 0 0 0
> 1 | 378 k = rec.sport.hockey
> 0 3 1 0 1 0 2 0 0 0
> 0 396 0 1 0 0 1 1 1
> 3 | 410 l = sci.crypt
> 0 7 0 7 7 2 6 4 0 0
> 0 1 369 2 2 0 0 0 0
> 2 | 409 m = sci.electronics
> 0 3 0 2 1 0 2 0 0 0
> 0 1 4 383 4 0 0 1 0
> 4 | 405 n = sci.med
> 0 5 0 0 1 0 3 0 0 0
> 0 0 1 0 374 1 0 0 1
> 1 | 387 o = sci.space
> 6 2 0 1 1 0 0 1 0 1
> 0 0 1 5 0 352 2 1 7
> 1 | 381 p = soc.religion.christian
> 1 1 0 0 0 0 0 0 0 0
> 1 0 0 0 0 0 373 1 0
> 1 | 378 q = talk.politics.mideast
> 0 0 0 0 0 0 1 0 1 0
> 0 2 0 0 0 0 0 346 2
> 7 | 359 r = talk.politics.guns
> 26 1 0 1 0 0 0 2 0 1
> 1 0 0 1 1 20 2 6 200
> 7 | 269 s = talk.religion.misc
> 1 0 0 0 0 0 0 2 0 0
> 1 0 0 2 2 0 1 14 0
> 286 | 309 t = talk.politics.misc
>
> =======================================================
> Statistics
> -------------------------------------------------------
> Kappa 0.8726
> Accuracy 90.1019%
> Reliability 85.4491%
> Reliability (standard deviation) 0.2222
>
> Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 10878 ms (Minutes: 0.1813)
>
> *SGD:*
> 7532 test files
>
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances : 5649 75%
> Incorrectly Classified Instances : 1883 25%
> Total Classified Instances : 7532
>
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a b c d e f g h i j
> k l m n o p q r s
> t <--Classified as
> 186 6 3 10 5 0 33 4 13 15
> 7 1 24 15 3 15 5 5 29
> 15 | 394 a = sci.space
> 5 309 0 3 2 5 0 0 0 1
> 9 21 2 0 0 18 4 4 1
> 1 | 385 b = comp.sys.mac.hardware
> 4 1 101 3 0 1 63 0 7 0
> 1 1 5 16 3 0 3 7 1
> 34 | 251 c = talk.religion.misc
> 11 12 1 265 1 10 3 0 0 17
> 10 11 5 2 0 11 3 6 21
> 0 | 389 d = comp.graphics
> 2 1 1 0 349 2 3 0 3 2
> 6 1 5 1 0 2 15 2 1
> 2 | 398 e = rec.motorcycles
> 7 20 3 19 2 254 6 0 2 11
> 2 39 7 2 0 4 2 2 9
> 3 | 394 f = comp.os.ms-windows.misc
> 2 1 13 0 0 0 247 0 1 1
> 3 0 6 2 4 0 2 3 5
> 29 | 319 g = alt.atheism
> 1 1 0 0 2 0 2 361 0 1
> 2 0 2 0 0 1 3 22 0
> 1 | 399 h = rec.sport.hockey
> 3 0 3 1 0 0 5 0 161 0
> 1 2 12 102 0 0 1 2 11
> 6 | 310 i = talk.politics.misc
> 2 8 0 19 0 19 0 0 1 294
> 10 11 4 2 0 5 0 3 11
> 6 | 395 j = comp.windows.x
> 2 10 0 1 1 0 0 0 0 1
> 347 13 2 1 0 5 3 2 2
> 0 | 390 k = misc.forsale
> 1 36 0 6 1 25 0 0 1 6
> 10 257 2 1 0 34 6 0 6
> 0 | 392 l = comp.sys.ibm.pc.hardware
> 2 2 2 2 1 0 12 0 0 6
> 10 4 312 5 2 13 11 3 3
> 6 | 396 m = sci.med
> 2 0 3 2 1 0 0 1 13 0
> 5 1 2 314 2 0 2 2 10
> 4 | 364 n = talk.politics.guns
> 1 0 2 1 1 0 34 1 33 1
> 3 0 1 8 271 1 4 5 6
> 3 | 376 o = talk.politics.mideast
> 3 14 0 8 2 8 3 1 1 7
> 12 29 6 2 1 245 13 2 32
> 4 | 393 p = sci.electronics
> 3 3 0 2 11 0 1 0 2 1
> 11 6 4 2 0 11 330 4 4
> 1 | 396 q = rec.autos
> 0 0 1 0 1 0 4 12 3 1
> 3 0 0 0 0 5 6 359 1
> 1 | 397 r = rec.sport.baseball
> 0 1 0 0 0 1 0 0 3 3
> 0 0 3 2 1 6 1 6 366
> 3 | 396 s = sci.crypt
> 0 2 11 1 1 0 40 0 1 2
> 3 4 2 1 0 5 0 2 2
> 321 | 398 t = soc.religion.christian
>
> =======================================================
> Statistics
> -------------------------------------------------------
> Kappa 0.7073
> Accuracy 75%
> Reliability 70.6238%
> Reliability (standard deviation) 0.2187
> Log-likelihood mean : -1.1182
> 25%-ile : -1.6911
> 75%-ile : -0.0803
>
> Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 10783 ms (Minutes: 0.17971666666666666)
>
>
>
>
> On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <su...@yahoo.com>wrote:
>
> > Thanks Andrew for reporting that. I rolled back the release to fix this
> > and few other issues.
> >
> > We have removed asf-examples*.sh from trunk as the sample file at the url
> > mentioned in ur email is not available.
> > This is something we need to fix and restore in 1.0.
> >
> >
> >
> >
> >
> >
> >
> > On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <ap...@outlook.com>
> > wrote:
> >
> > from the asf-email-examples.sh script:
> >
> > # You will need to download or otherwise obtain some or all of the Amazon
> > ASF Em
> > ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566) to
> > use this
> > script.
> > # To obtain a full copy you will need to launch an EC2 instance and mount
> > the da
> > taset to download it, otherwise you can get a sample of it at
> > #
> > http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
> >
> > It looks like the:
> > http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
> >
> > link is down.
> >
> > Is there somewhere else that we can get a subset of the ASF emails?
> >
> >
> >
> > Date: Tue, 21 Jan 2014 09:48:06 -0800
> > > Subject: Re: MAHOUT 0.9 Release - New URL
> > > From: andrew.musselman@gmail.com
> > > To: dev@mahout.apache.org
> > >
> > > Sure thing; continuing to smoke test the other examples tonight
> > >
> > >
> > > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <suneel_marthi@yahoo.com
> > >wrote:
> > >
> > > > Thanks Andrew M., see that some of the example scripts need to be
> > fixed as
> > > > they still refer to the deprecated algorithms.
> > > > See that the Streaming KMeans has failed for you as well.
> > > >
> > > > I'll be rolling back the release today to fix these issues.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
> > > > andrew.musselman@gmail.com> wrote:
> > > >
> > > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default
> > 64-bit
> > > > Linux AMI from tarball.
> > > >
> > > > All tests pass.
> > > >
> > > > *Output of examples:*
> > > > *asf-email-examples.sh, run on mahout.apache.org
> > > > <http://mahout.apache.org>:*
> > > > *recommendations:*
> > > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
> > > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000 | less
> > > > 1
> > > >
> > > >
> > [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
> > > > 4
> > > >
> > > >
> > [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
> > > > 6
> > > >
> > > >
> > [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
> > > > 8
> > > > [12758:1.0,19409:1.0,11112:1.0]
> > > > 11
> > > >
> > > >
> > [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
> > > > 14
> > > >
> > > >
> > [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
> > > > 15
> > > >
> > > >
> > [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
> > > > 16
> > > >
> > > >
> > [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
> > > > 18
> > > >
> > > >
> > [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
> > > > 19 [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
> > > > 20
> > > >
> > > >
> > [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
> > > > [snip]
> > > >
> > > > *clustering; kmeans:*
> > > > [snip]
> > > > Weight : [props - optional]: Point:
> > > > 1.0 :
> > > > [distance-squared=1.0193102046188427]:
> > > > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus =
> > > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
> > 7573:0.204,
> > > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093, 9779:0.159,
> > > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
> > > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
> > > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
> > > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
> > > > 39789:0.110, 40743:0.190, 45775:0.086]
> > > > 1.0 : [distance-squared=0.9823018320457279]:
> > > > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus =
> > > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
> > 5336:0.106,
> > > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173, 7832:0.072,
> > > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
> > > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
> > > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
> > > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
> > > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
> > > > 1.0 : [distance-squared=0.9509142993214911]:
> > > > /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor=
> > > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
> > > > 4419:0.076,
> > > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056, 7235:0.048,
> > > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123, 7683:0.077,
> > > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
> > 10225:0.081,
> > > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
> > > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
> > > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
> > > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
> > > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
> > > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
> > > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
> > > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
> > > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
> > > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
> > > > 41280:0.065, 41696:0.072, 41947:0.118,
> > > > 43685:0.086, 44077:0.308,
> > > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
> > > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
> > > > [snip]
> > > >
> > > > *clustering; dirichlet:*
> > > > Get this complaint:
> > > > Running Dirichlet with K = 8
> > > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > > > HADOOP_CONF_DIR=
> > > > MAHOUT-JOB:
> > > >
> > > >
> > /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class:
> > dirichlet
> > > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props found on
> > > > classpath, will use command-line arguments only
> > > > Unknown program 'dirichlet' chosen.
> > > >
> > > > *clustering: minhash:*
> > > > Running Minhash
> > > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > > > HADOOP_CONF_DIR=
> > > > MAHOUT-JOB:
> > > >
> > > >
> > /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > > > 14/01/21 05:17:27 WARN
> > > > driver.MahoutDriver: Unable to add class: minhash
> > > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found on
> > > > classpath, will use command-line arguments only
> > > > Unknown program 'minhash' chosen.
> > > >
> > > > *classification; standard:*
> > > > =======================================================
> > > > Summary
> > > > -------------------------------------------------------
> > > > Correctly Classified Instances : 5384 87.7874%
> > > > Incorrectly Classified Instances : 749 12.2126%
> > > > Total Classified Instances : 6133
> > > >
> > > > =======================================================
> > > > Confusion Matrix
> > > > -------------------------------------------------------
> > > > a b c d
> > > > <--Classified as
> > > > 2949 7 531 25 | 3512 a = dev
> > > > 0 0 0 0 | 0 b = general
> > > > 99 8 1763 8 | 1878 c = user
> > > > 41 1 29 672 | 743 d = commits
> > > >
> > > > =======================================================
> > > > Statistics
> > > > -------------------------------------------------------
> > > > Kappa
> > > > 0.7877
> > > > Accuracy 87.7874%
> > > > Reliability 53.658%
> > > > Reliability (standard deviation) 0.4911
> > > >
> > > > *classification; complementary:*
> > > > =======================================================
> > > > Summary
> > > > -------------------------------------------------------
> > > > Correctly Classified Instances : 5530 90.1679%
> > > > Incorrectly Classified Instances : 603 9.8321%
> > > > Total Classified Instances :
> > > > 6133
> > > >
> > > > =======================================================
> > > > Confusion Matrix
> > > > -------------------------------------------------------
> > > > a b c d <--Classified as
> > > > 3168 0 276 68 | 3512 a = dev
> > > > 0 0 0 0 | 0 b = general
> > > > 196 0 1652 30 | 1878 c = user
> > > > 25 0 8 710 | 743 d =
> > > > commits
> > > >
> > > > =======================================================
> > > > Statistics
> > > > -------------------------------------------------------
> > > > Kappa 0.8259
> > > > Accuracy 90.1679%
> > > > Reliability 54.7459%
> > > > Reliability (standard deviation) 0.5005
> > > >
> > > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms
> > (Minutes:
> > > > 0.34836666666666666)
> > > >
> > > > *classification; sgd, with three categories:*
> > > > Running SGD Training
> > > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
> > > > and
> > > > HADOOP_CONF_DIR=
> > > > MAHOUT-JOB:
> > > >
> > > >
> > /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
> > > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
> > classpath,
> > > > will use command-line arguments only
> > > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
> > > > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
> > > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
> > > > --output=[asf-output/classification/sgd/models], --poolSize=[5],
> > > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
> > > > 24168 training files
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 1
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> > > > 2
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 3
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 4
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 6
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 8
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 10
> > > > 0.000 0.00 none
> > > > 0.00 0.00
> > > > 0.00 0.00 0.0000000 0.0000000 12
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 15
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 20
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 25
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 30
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000
> > > > 0.0000000 40
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 50
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 60
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 70
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 80
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 100
> > > > 0.000
> > > > 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 120
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 140
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 150
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 200
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 250
> > > > 0.000 0.00 none
> > > > 0.00 0.00
> > > > 0.00 0.00 0.0000000 0.0000000 300
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 400
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 500
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 600
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 700
> > > > 0.000 0.00 none
> > > > 0.00 0.00 0.00 0.00 0.0000000
> > > > 0.0000000 800
> > > > 0.000 0.00 none
> > > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > > > 1.0019413e-08 1000 -0.607 75.78 none
> > > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > > > 1.0019413e-08 1200 -0.607 75.78 none
> > > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > > > 1.0019413e-08 1400 -0.607 75.78 none
> > > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > > > 1.0019413e-08 1500 -0.607 75.78 none
> > > > 0.24 43686.00 17924.00 329.50
> > > > 1.0571799e-08
> > > > 1.0032261e-08 2000 -0.487 82.65 none
> > > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
> > > > 1.0011902e-08 2500 -0.439 83.90 none
> > > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
> > > > 1.0011902e-08 3000 -0.439 83.90 none
> > > > 0.32 50635.00 28531.00 437.09 1.0551175e-08
> > > > 1.0000001e-08 4000 -0.351 88.14 none
> > > > 0.32 50635.00 32642.00 437.09 1.0551175e-08
> > > > 1.0000000e-08 5000 -0.378 87.10 none
> > > > 0.32 50635.00 36461.00 437.09
> > > > 1.0556652e-08
> > > > 1.0000001e-08 6000 -0.372 86.89 none
> > > > 0.32 50635.00 37768.00 437.09 1.0576742e-08
> > > > 1.0000001e-08 7000 -0.334 89.26 none
> > > > 0.32 50635.00 38807.00 437.09 1.0576742e-08
> > > > 1.0000000e-08 8000 -0.368 87.52 none
> > > > 0.32 50635.00 44731.00 437.09 1.0576716e-08
> > > > 1.0000000e-08 10000 -0.374 87.39 none
> > > > 0.32 50635.00 45672.00 437.09 1.0576716e-08
> > > > 1.0000000e-08 12000 -0.298 88.26 none
> > > > Exception in thread "main" java.lang.IllegalStateException:
> > > > java.lang.ArrayIndexOutOfBoundsException:
> > > > 2
> > > > at
> > > >
> > > >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
> > > > at
> > > >
> > > >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
> > > > at
> > > >
> > > >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
> > > > at
> > > >
> > org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
> > > > at
> > > >
> > org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
> > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > > at
> > > >
> > > >
> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > > >
> > > > at
> > > >
> > > >
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > > at java.lang.reflect.Method.invoke(Method.java:622)
> > > > at
> > > >
> > > >
> > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > > > at
> > > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > > > at
> > > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > > at
> > > >
> > > >
> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > > > at
> > > >
> > > >
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > > at java.lang.reflect.Method.invoke(Method.java:622)
> > > > at
> > > > org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> > > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
> > > > at
> > > > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
> > > > at
> > > >
> > > >
> > org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
> > > > at
> > > >
> > > >
> > org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
> > > > at
> > > >
> > > >
> > org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
> > > > at
> > > >
> > > >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
> > > > at
> > > >
> > > >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
> > > >
> > > > at
> > > >
> > > >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
> > > > at
> > > >
> > > >
> > org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
> > > > at
> > > >
> > > >
> > org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
> > > > at
> > > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > > > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > > at
> > > >
> > > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > > > at
> > > >
> > > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > > at java.lang.Thread.run(Thread.java:701)
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
> > > > andrew.musselman@gmail.com> wrote:
> > > >
> > > > > Trying out the build today
> > > > >
> > > > >
> > > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
> > suneel_marthi@yahoo.com
> > > > >wrote:
> > > > >
> > > > >> This is an issue (trivial one though) that needs to be fixed for 0.9
> > > > >> Release, will be rerolling the release today (in the next few hrs)
> > and
> > > > >> putting out a new release candidate in staging.
> > > > >>
> > > > >> Thanks for reporting this Andrew P.
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
> > > > ap.dev@outlook.com>
> > > > >> wrote:
> > > > >>
> > > > >> I ran through the tests with on a CentOS VM
> > > > AMD64 2 cores 4 GB RAM. Had
> > > > >> a bit of trouble getting the Hadoop natives to compile and
> > therefore may
> > > > >> have run into some problems because of the hadoop setup. Ran into
> > some
> > > > >> problems in the example scripts. Particularly with
> > > > >> ./cluster-syntheticcontrol.sh ->4,5. I will run through the rest
> > of the
> > > > >> examples when im sure I've got hadoop setup right.
> > > > >>
> > > > >>
> > > > >> Apache Maven 3.1.2-SNAPSHOT
> > > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> > > > >> Java home: /usr/java/jdk1.6.0_45/jre
> > > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch:
> > "amd64",
> > > > >> family: "unix"
> > > > >> $MAHOUT_LOCAL=true
> > > > >> Hadoop 2.2.0
> > > > >>
> > > > >>
> > > > >> a) Verify that u can unpack the release (tar or zip) ...passed (tar)
> > > > >> [passed ]
> > > > >>
> > > > >> b) Verify u r able to compile the
> > > > distro
> > > > >>
> > > > >> mvn compile- [passed with warnings]
> > > > >>
> > > > >> [WARNING] Expected all dependencies to require Scala version:
> > 2.9.3
> > > > >> [WARNING] org.apache.mahout:mahout-math-scala:0.9 requires
> > scala
> > > > >> version: 2.9.3
> > > > >> [WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala
> > > > >> version: 2.9.2
> > > > >> [WARNING] Multiple versions of scala libraries detected!
> > > > >>
> > > > >> c) Run through the unit tests: mvn clean test
> > > > >> mvn clean test [passed]
> > > > >>
> > > > >> d) Run the
> > > > >> example scripts under $MAHOUT_HOME/examples/bin.
> > > > >> Please run through all the different options in each script
> > > > >>
> > > > >> Running example scripts with $MAHOUT_LOCAL=true
> > > > >>
> > > > >>
> > > > ./cluster-syntheticcontrol.sh ->1 [works]
> > > > >> ./cluster-syntheticcontrol.sh ->2 [works]
> > > > >> ./cluster-syntheticcontrol.sh ->3 [works]
> > > > >>
> > > > >>
> > > > >> ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
> > > > >> [...]
> > > > >> WARNING: Unable to add class:
> > > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > > > >> java.lang.ClassNotFoundException:
> > > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > > > >> at
> > > > >> java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > > > >> at java.security.AccessController.doPrivileged(Native
> > Method)
> > > > >> at
> > java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > > > >> at
> > > > java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > > > >> at
> > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > > > >> at java.lang.Class.forName0(Native Method)
> > > > >> at java.lang.Class.forName(Class.java:171)
> > > > >> at
> > > > >>
> > org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > > > >> at
> > > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > > > >> Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
> > > > >>
> > > > >>
> > > > >> ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
> > > > >>
> > > > >> WARNING: Unable to add class:
> > > > >>
> > > > org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > > > >> java.lang.ClassNotFoundException:
> > > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > > > >> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > > > >> at java.security.AccessController.doPrivileged(Native
> > Method)
> > > > >> at
> > java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > > > >> at
> > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > > > >> at java.lang.Class.forName0(Native Method)
> > > > >> at
> > > > java.lang.Class.forName(Class.java:171)
> > > > >> at
> > > > >>
> > org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > > > >> at
> > > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > > > >> Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
> > > > >> WARNING: No
> > > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props
> > found
> > > > on
> > > > >> classpath, will use command-line arguments only
> > > > >> Unknown program
> > > > >> 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
> > chosen.
> > > > >>
> > > > >>
> > > > >> ./classify-20newsgroups.sh ->1 [works]
> > > > >> ./classify-20newsgroups.sh ->2 [works]
> > > > >>
> > > > >>
> > > > >> cluster-reuters.sh ->1 [works]
> > > > >>
> > > > cluster-reuters.sh ->2 [works]
> > > > >> cluster-reuters.sh ->3 [works]
> > > > >>
> > > > >> Same error as noted previosly in the thread:
> > > > >>
> > > > >> cluster-reuters.sh ->4 [0 clusters]
> > > > >>
> > > > >> [...]
> > > > >>
> > > > >> WARNING: No qualcluster.props found on classpath, will use
> > > > >> command-line arguments only
> > > > >> Num clusters: 0; maxDistance: 0.000000
> > > > >> [Dunn Index]
> > > > >> First: Infinity
> > > > >> [Davies-Bouldin Index] First: NaN
> > > > >> Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
> > > > >> INFO: Program took 669 ms (Minutes: 0.01115)
> > > > >> cluster,distance.mean,distance.sd
> > > > >>
> > > >
> > > >
> > ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
> > > > >> > From: suneel_marthi@yahoo.com
> > > > >> > Subject: MAHOUT 0.9 Release - New URL
> > > > >> > To: user@mahout.apache.org; dev@mahout.apache.org
> > > > >> >
> > > > >> > Third time's a Charm!!!
> > > > >> >
> > > > >> >
> > > > >> > Here's the new URL for Mahout 0.9 Release:
> > > > >> >
> > > > >>
> > > >
> > https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> > > > >> >
> > > > >> > For those volunteering to test this, some of the things to be
> > > > verified:
> > > > >> >
> > > > >> > a) Verify that u can unpack the release (tar or zip)
> > > > >> > b) Verify u r able to compile the distro
> > > > >> > c) Run through the unit tests: mvn clean test
> > > > >> > d) Run the example scripts
> > > > >> under $MAHOUT_HOME/examples/bin. Please run through all the
> > different
> > > > >> options in each script.
> > > > >> >
> > > > >> >
> > > > >> > Committers
> > > > >> > and PMC members:
> > > > >> > ---------------------------------------
> > > > >> >
> > > > >> > Need 'at least 3 +1 votes' for the Release to pass.
> > > > >> >
> > > > >> >
> > > > >> > Thanks and
> > > > Regards.
> > > > >>
> > > > >
> > > > >
> > > >
> >
Re: MAHOUT 0.9 Release - New URL
Posted by Suneel Marthi <su...@yahoo.com>.
Rolled back trunk to 0.9-SNAPSHOT, please go ahead and commit any changes.
On Saturday, January 25, 2014 4:19 AM, Suneel Marthi <su...@yahoo.com> wrote:
I'll be rolling back the 0.9 Release today that's presently in staging in light of the issues that have been reported in the last 2 days and need to be fixed as part of the Release.
Please hold off from committing any new code to trunk meanwhile.
Thanks.
On Friday, January 24, 2014 7:36 PM, Ted Dunning <te...@gmail.com> wrote:
My schedule has opened up a bit and I can review as well.
On Fri, Jan 24, 2014 at 3:06 PM, Sebastian Schelter <ss...@googlemail.com> wrote:
I will try the next candidate agaim, so one vote is sure.
>Am 24.01.2014 23:54 schrieb "Suneel Marthi" <su...@yahoo.com>:
>
>
>> I am open to having the conversation (and a part of me feels that the
>> clusteringId fix should be in 0.9).
>>
>> If we decide to incorporate that into 0.9, I need to rollback the 0.9
>> Release that's presently out there in staging (for the 5th time in a row
>> now).
>> I am fine with doing that.
>>
>> What do you think we should do?
>>
>> a) Go ahead with 0.9 release without the fix for M-1410 .
>> b) Rollback 0.9 and include the fix for M-1410
>> c) Go ahead with 0.9, have an interim 1.0 Release Candidate that includes
>> M-1410 and any other issues/enhancements that are fixed.
>>
>>
>> I am leaning towards (b), my only concern being that from my experience in
>> the past few weeks; its become real hard to muster the minimum 3 +1 PMC
>> votes required for a release to pass.
>>
>>
>>
>>
>>
>>
>>
>>
>> On Friday, January 24, 2014 5:45 PM, Ted Dunning <te...@gmail.com>
>> wrote:
>>
>>
>>
>> Can we hold a separate discussion about whether the clustering id issue
>> has to be in 0.9 while extending the vote deadline if necessary?
>>
>> If not, then all these votes are great and the release can go forward.
>>
>> If it is the sense that that fix has to be in, we should leave time for
>> people for people to reverse their votes to -1.
>>
>>
>>
>>
>> On Fri, Jan 24, 2014 at 2:22 PM, Suneel Marthi <su...@yahoo.com>
>> wrote:
>>
>> Thanks for all those that volunteered. The voting for 0.9 Release closes
>> tomorrow.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >On Friday, January 24, 2014 4:05 AM, Gokhan Capan <gk...@gmail.com>
>> wrote:
>> >
>> >Using CentOS 6.5 and hadoop 1.2.1, all passed.
>> >
>> >+1 from me
>> >
>> >Gokhan
>> >
>> >
>> >
>> >On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo <ap...@outlook.com>
>> wrote:
>> >
>> >> a),b),c),d) all passed on CentOS for me
>> >>
>> >> > Date: Thu, 23 Jan 2014 13:43:06 +0200
>> >> > Subject: Re: MAHOUT 0.9 Release - New URL
>> >> > From: ssvinarchuk@hortonworks.com
>> >> > To: dev@mahout.apache.org
>> >> >
>> >> > I did a), b), c), d) and all steps pass.
>> >> > +1
>> >> >
>> >> >
>> >> > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll <gsingers@apache.org
>> >> >wrote:
>> >> >
>> >> > > +1 from me.
>> >> > >
>> >> > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi <suneel_marthi@yahoo.com
>> >
>> >> > > wrote:
>> >> > >
>> >> > > > Fixed the issues that were reported this week and restored FP
>> mining
>> >> > > into the codebase.
>> >> > > >
>> >> > > > Here's the URL for the final release in staging:-
>> >> > > >
>> >> > >
>> >>
>> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
>> >> > > >
>> >> > > > The artifacts have been signed with the
>> > following key:
>> >> > > > https://people.apache.org/keys/committer/smarthi.asc
>> >> > > >
>> >> > > >
>> >> > > > a) Verify that u can unpack the release (tar or zip)
>> >> > > > b) Verify u r able to compile the distro
>> >> > > > c) Run through the unit tests: mvn clean test
>> >> > > > d) Run the example scripts under
>> > $MAHOUT_HOME/examples/bin. Please
>> >> run
>> >> > > through all the different options in each script.
>> >> > > >
>> >> > > > Committers and PMC, need a minimum of 3 '+1' votes for the release
>> >> to be
>> >> > > finalized.
>> >> > >
>> >> > > --------------------------------------------
>> >> > > Grant Ingersoll | @gsingers
>> >> > > http://www.lucidworks.com
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> >
>> >> > --
>> >>
>> > > CONFIDENTIALITY NOTICE
>> >> > NOTICE: This message is intended for the use of the individual or
>> entity
>> >> to
>> >> > which it is addressed and may contain information that is
>> confidential,
>> >> > privileged and exempt from disclosure under applicable law. If the
>> reader
>> >> > of this message is not the intended recipient, you are hereby notified
>> >> that
>> >> > any printing, copying, dissemination, distribution, disclosure or
>> >> > forwarding of this communication is strictly prohibited. If you have
>> >> > received this communication in error, please contact the sender
>> >> immediately
>> >> > and delete it from your system. Thank You.
>> >>
>> >>
>
Re: MAHOUT 0.9 Release - New URL
Posted by Suneel Marthi <su...@yahoo.com>.
I'll be rolling back the 0.9 Release today that's presently in staging in light of the issues that have been reported in the last 2 days and need to be fixed as part of the Release.
Please hold off from committing any new code to trunk meanwhile.
Thanks.
On Friday, January 24, 2014 7:36 PM, Ted Dunning <te...@gmail.com> wrote:
My schedule has opened up a bit and I can review as well.
On Fri, Jan 24, 2014 at 3:06 PM, Sebastian Schelter <ss...@googlemail.com> wrote:
I will try the next candidate agaim, so one vote is sure.
>Am 24.01.2014 23:54 schrieb "Suneel Marthi" <su...@yahoo.com>:
>
>
>> I am open to having the conversation (and a part of me feels that the
>> clusteringId fix should be in 0.9).
>>
>> If we decide to incorporate that into 0.9, I need to rollback the 0.9
>> Release that's presently out there in staging (for the 5th time in a row
>> now).
>> I am fine with doing that.
>>
>> What do you think we should do?
>>
>> a) Go ahead with 0.9 release without the fix for M-1410 .
>> b) Rollback 0.9 and include the fix for M-1410
>> c) Go ahead with 0.9, have an interim 1.0 Release Candidate that includes
>> M-1410 and any other issues/enhancements that are fixed.
>>
>>
>> I am leaning towards (b), my only concern being that from my experience in
>> the past few weeks; its become real hard to muster the minimum 3 +1 PMC
>> votes required for a release to pass.
>>
>>
>>
>>
>>
>>
>>
>>
>> On Friday, January 24, 2014 5:45 PM, Ted Dunning <te...@gmail.com>
>> wrote:
>>
>>
>>
>> Can we hold a separate discussion about whether the clustering id issue
>> has to be in 0.9 while extending the vote deadline if necessary?
>>
>> If not, then all these votes are great and the release can go forward.
>>
>> If it is the sense that that fix has to be in, we should leave time for
>> people for people to reverse their votes to -1.
>>
>>
>>
>>
>> On Fri, Jan 24, 2014 at 2:22 PM, Suneel Marthi <su...@yahoo.com>
>> wrote:
>>
>> Thanks for all those that volunteered. The voting for 0.9 Release closes
>> tomorrow.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >On Friday, January 24, 2014 4:05 AM, Gokhan Capan <gk...@gmail.com>
>> wrote:
>> >
>> >Using CentOS 6.5 and hadoop 1.2.1, all passed.
>> >
>> >+1 from me
>> >
>> >Gokhan
>> >
>> >
>> >
>> >On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo <ap...@outlook.com>
>> wrote:
>> >
>> >> a),b),c),d) all passed on CentOS for me
>> >>
>> >> > Date: Thu, 23 Jan 2014 13:43:06 +0200
>> >> > Subject: Re: MAHOUT 0.9 Release - New URL
>> >> > From: ssvinarchuk@hortonworks.com
>> >> > To: dev@mahout.apache.org
>> >> >
>> >> > I did a), b), c), d) and all steps pass.
>> >> > +1
>> >> >
>> >> >
>> >> > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll <gsingers@apache.org
>> >> >wrote:
>> >> >
>> >> > > +1 from me.
>> >> > >
>> >> > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi <suneel_marthi@yahoo.com
>> >
>> >> > > wrote:
>> >> > >
>> >> > > > Fixed the issues that were reported this week and restored FP
>> mining
>> >> > > into the codebase.
>> >> > > >
>> >> > > > Here's the URL for the final release in staging:-
>> >> > > >
>> >> > >
>> >>
>> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
>> >> > > >
>> >> > > > The artifacts have been signed with the
>> > following key:
>> >> > > > https://people.apache.org/keys/committer/smarthi.asc
>> >> > > >
>> >> > > >
>> >> > > > a) Verify that u can unpack the release (tar or zip)
>> >> > > > b) Verify u r able to compile the distro
>> >> > > > c) Run through the unit tests: mvn clean test
>> >> > > > d) Run the example scripts under
>> > $MAHOUT_HOME/examples/bin. Please
>> >> run
>> >> > > through all the different options in each script.
>> >> > > >
>> >> > > > Committers and PMC, need a minimum of 3 '+1' votes for the release
>> >> to be
>> >> > > finalized.
>> >> > >
>> >> > > --------------------------------------------
>> >> > > Grant Ingersoll | @gsingers
>> >> > > http://www.lucidworks.com
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> >
>> >> > --
>> >>
>> > > CONFIDENTIALITY NOTICE
>> >> > NOTICE: This message is intended for the use of the individual or
>> entity
>> >> to
>> >> > which it is addressed and may contain information that is
>> confidential,
>> >> > privileged and exempt from disclosure under applicable law. If the
>> reader
>> >> > of this message is not the intended recipient, you are hereby notified
>> >> that
>> >> > any printing, copying, dissemination, distribution, disclosure or
>> >> > forwarding of this communication is strictly prohibited. If you have
>> >> > received this communication in error, please contact the sender
>> >> immediately
>> >> > and delete it from your system. Thank You.
>> >>
>> >>
>
Re: MAHOUT 0.9 Release - New URL
Posted by Ted Dunning <te...@gmail.com>.
My schedule has opened up a bit and I can review as well.
On Fri, Jan 24, 2014 at 3:06 PM, Sebastian Schelter <ssc.open@googlemail.com
> wrote:
> I will try the next candidate agaim, so one vote is sure.
> Am 24.01.2014 23:54 schrieb "Suneel Marthi" <su...@yahoo.com>:
>
> > I am open to having the conversation (and a part of me feels that the
> > clusteringId fix should be in 0.9).
> >
> > If we decide to incorporate that into 0.9, I need to rollback the 0.9
> > Release that's presently out there in staging (for the 5th time in a row
> > now).
> > I am fine with doing that.
> >
> > What do you think we should do?
> >
> > a) Go ahead with 0.9 release without the fix for M-1410 .
> > b) Rollback 0.9 and include the fix for M-1410
> > c) Go ahead with 0.9, have an interim 1.0 Release Candidate that includes
> > M-1410 and any other issues/enhancements that are fixed.
> >
> >
> > I am leaning towards (b), my only concern being that from my experience
> in
> > the past few weeks; its become real hard to muster the minimum 3 +1 PMC
> > votes required for a release to pass.
> >
> >
> >
> >
> >
> >
> >
> >
> > On Friday, January 24, 2014 5:45 PM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> >
> >
> > Can we hold a separate discussion about whether the clustering id issue
> > has to be in 0.9 while extending the vote deadline if necessary?
> >
> > If not, then all these votes are great and the release can go forward.
> >
> > If it is the sense that that fix has to be in, we should leave time for
> > people for people to reverse their votes to -1.
> >
> >
> >
> >
> > On Fri, Jan 24, 2014 at 2:22 PM, Suneel Marthi <su...@yahoo.com>
> > wrote:
> >
> > Thanks for all those that volunteered. The voting for 0.9 Release closes
> > tomorrow.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >On Friday, January 24, 2014 4:05 AM, Gokhan Capan <gk...@gmail.com>
> > wrote:
> > >
> > >Using CentOS 6.5 and hadoop 1.2.1, all passed.
> > >
> > >+1 from me
> > >
> > >Gokhan
> > >
> > >
> > >
> > >On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo <ap...@outlook.com>
> > wrote:
> > >
> > >> a),b),c),d) all passed on CentOS for me
> > >>
> > >> > Date: Thu, 23 Jan 2014 13:43:06 +0200
> > >> > Subject: Re: MAHOUT 0.9 Release - New URL
> > >> > From: ssvinarchuk@hortonworks.com
> > >> > To: dev@mahout.apache.org
> > >> >
> > >> > I did a), b), c), d) and all steps pass.
> > >> > +1
> > >> >
> > >> >
> > >> > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll <
> gsingers@apache.org
> > >> >wrote:
> > >> >
> > >> > > +1 from me.
> > >> > >
> > >> > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi <
> suneel_marthi@yahoo.com
> > >
> > >> > > wrote:
> > >> > >
> > >> > > > Fixed the issues that were reported this week and restored FP
> > mining
> > >> > > into the codebase.
> > >> > > >
> > >> > > > Here's the URL for the final release in staging:-
> > >> > > >
> > >> > >
> > >>
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> > >> > > >
> > >> > > > The artifacts have been signed with the
> > > following key:
> > >> > > > https://people.apache.org/keys/committer/smarthi.asc
> > >> > > >
> > >> > > >
> > >> > > > a) Verify that u can unpack the release (tar or zip)
> > >> > > > b) Verify u r able to compile the distro
> > >> > > > c) Run through the unit tests: mvn clean test
> > >> > > > d) Run the example scripts under
> > > $MAHOUT_HOME/examples/bin. Please
> > >> run
> > >> > > through all the different options in each script.
> > >> > > >
> > >> > > > Committers and PMC, need a minimum of 3 '+1' votes for the
> release
> > >> to be
> > >> > > finalized.
> > >> > >
> > >> > > --------------------------------------------
> > >> > > Grant Ingersoll | @gsingers
> > >> > > http://www.lucidworks.com
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> >
> > >> > --
> > >>
> > > > CONFIDENTIALITY NOTICE
> > >> > NOTICE: This message is intended for the use of the individual or
> > entity
> > >> to
> > >> > which it is addressed and may contain information that is
> > confidential,
> > >> > privileged and exempt from disclosure under applicable law. If the
> > reader
> > >> > of this message is not the intended recipient, you are hereby
> notified
> > >> that
> > >> > any printing, copying, dissemination, distribution, disclosure or
> > >> > forwarding of this communication is strictly prohibited. If you have
> > >> > received this communication in error, please contact the sender
> > >> immediately
> > >> > and delete it from your system. Thank You.
> > >>
> > >>
>
Re: MAHOUT 0.9 Release - New URL
Posted by Sebastian Schelter <ss...@googlemail.com>.
I will try the next candidate agaim, so one vote is sure.
Am 24.01.2014 23:54 schrieb "Suneel Marthi" <su...@yahoo.com>:
> I am open to having the conversation (and a part of me feels that the
> clusteringId fix should be in 0.9).
>
> If we decide to incorporate that into 0.9, I need to rollback the 0.9
> Release that's presently out there in staging (for the 5th time in a row
> now).
> I am fine with doing that.
>
> What do you think we should do?
>
> a) Go ahead with 0.9 release without the fix for M-1410 .
> b) Rollback 0.9 and include the fix for M-1410
> c) Go ahead with 0.9, have an interim 1.0 Release Candidate that includes
> M-1410 and any other issues/enhancements that are fixed.
>
>
> I am leaning towards (b), my only concern being that from my experience in
> the past few weeks; its become real hard to muster the minimum 3 +1 PMC
> votes required for a release to pass.
>
>
>
>
>
>
>
>
> On Friday, January 24, 2014 5:45 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
>
>
> Can we hold a separate discussion about whether the clustering id issue
> has to be in 0.9 while extending the vote deadline if necessary?
>
> If not, then all these votes are great and the release can go forward.
>
> If it is the sense that that fix has to be in, we should leave time for
> people for people to reverse their votes to -1.
>
>
>
>
> On Fri, Jan 24, 2014 at 2:22 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> Thanks for all those that volunteered. The voting for 0.9 Release closes
> tomorrow.
> >
> >
> >
> >
> >
> >
> >
> >
> >On Friday, January 24, 2014 4:05 AM, Gokhan Capan <gk...@gmail.com>
> wrote:
> >
> >Using CentOS 6.5 and hadoop 1.2.1, all passed.
> >
> >+1 from me
> >
> >Gokhan
> >
> >
> >
> >On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo <ap...@outlook.com>
> wrote:
> >
> >> a),b),c),d) all passed on CentOS for me
> >>
> >> > Date: Thu, 23 Jan 2014 13:43:06 +0200
> >> > Subject: Re: MAHOUT 0.9 Release - New URL
> >> > From: ssvinarchuk@hortonworks.com
> >> > To: dev@mahout.apache.org
> >> >
> >> > I did a), b), c), d) and all steps pass.
> >> > +1
> >> >
> >> >
> >> > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll <gsingers@apache.org
> >> >wrote:
> >> >
> >> > > +1 from me.
> >> > >
> >> > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi <suneel_marthi@yahoo.com
> >
> >> > > wrote:
> >> > >
> >> > > > Fixed the issues that were reported this week and restored FP
> mining
> >> > > into the codebase.
> >> > > >
> >> > > > Here's the URL for the final release in staging:-
> >> > > >
> >> > >
> >>
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> >> > > >
> >> > > > The artifacts have been signed with the
> > following key:
> >> > > > https://people.apache.org/keys/committer/smarthi.asc
> >> > > >
> >> > > >
> >> > > > a) Verify that u can unpack the release (tar or zip)
> >> > > > b) Verify u r able to compile the distro
> >> > > > c) Run through the unit tests: mvn clean test
> >> > > > d) Run the example scripts under
> > $MAHOUT_HOME/examples/bin. Please
> >> run
> >> > > through all the different options in each script.
> >> > > >
> >> > > > Committers and PMC, need a minimum of 3 '+1' votes for the release
> >> to be
> >> > > finalized.
> >> > >
> >> > > --------------------------------------------
> >> > > Grant Ingersoll | @gsingers
> >> > > http://www.lucidworks.com
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> >
> >> > --
> >>
> > > CONFIDENTIALITY NOTICE
> >> > NOTICE: This message is intended for the use of the individual or
> entity
> >> to
> >> > which it is addressed and may contain information that is
> confidential,
> >> > privileged and exempt from disclosure under applicable law. If the
> reader
> >> > of this message is not the intended recipient, you are hereby notified
> >> that
> >> > any printing, copying, dissemination, distribution, disclosure or
> >> > forwarding of this communication is strictly prohibited. If you have
> >> > received this communication in error, please contact the sender
> >> immediately
> >> > and delete it from your system. Thank You.
> >>
> >>
Re: MAHOUT 0.9 Release - New URL
Posted by Suneel Marthi <su...@yahoo.com>.
I am open to having the conversation (and a part of me feels that the clusteringId fix should be in 0.9).
If we decide to incorporate that into 0.9, I need to rollback the 0.9 Release that's presently out there in staging (for the 5th time in a row now).
I am fine with doing that.
What do you think we should do?
a) Go ahead with 0.9 release without the fix for M-1410 .
b) Rollback 0.9 and include the fix for M-1410
c) Go ahead with 0.9, have an interim 1.0 Release Candidate that includes M-1410 and any other issues/enhancements that are fixed.
I am leaning towards (b), my only concern being that from my experience in the past few weeks; its become real hard to muster the minimum 3 +1 PMC votes required for a release to pass.
On Friday, January 24, 2014 5:45 PM, Ted Dunning <te...@gmail.com> wrote:
Can we hold a separate discussion about whether the clustering id issue has to be in 0.9 while extending the vote deadline if necessary?
If not, then all these votes are great and the release can go forward.
If it is the sense that that fix has to be in, we should leave time for people for people to reverse their votes to -1.
On Fri, Jan 24, 2014 at 2:22 PM, Suneel Marthi <su...@yahoo.com> wrote:
Thanks for all those that volunteered. The voting for 0.9 Release closes tomorrow.
>
>
>
>
>
>
>
>
>On Friday, January 24, 2014 4:05 AM, Gokhan Capan <gk...@gmail.com> wrote:
>
>Using CentOS 6.5 and hadoop 1.2.1, all passed.
>
>+1 from me
>
>Gokhan
>
>
>
>On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo <ap...@outlook.com> wrote:
>
>> a),b),c),d) all passed on CentOS for me
>>
>> > Date: Thu, 23 Jan 2014 13:43:06 +0200
>> > Subject: Re: MAHOUT 0.9 Release - New URL
>> > From: ssvinarchuk@hortonworks.com
>> > To: dev@mahout.apache.org
>> >
>> > I did a), b), c), d) and all steps pass.
>> > +1
>> >
>> >
>> > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll <gsingers@apache.org
>> >wrote:
>> >
>> > > +1 from me.
>> > >
>> > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi <su...@yahoo.com>
>> > > wrote:
>> > >
>> > > > Fixed the issues that were reported this week and restored FP mining
>> > > into the codebase.
>> > > >
>> > > > Here's the URL for the final release in staging:-
>> > > >
>> > >
>> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
>> > > >
>> > > > The artifacts have been signed with the
> following key:
>> > > > https://people.apache.org/keys/committer/smarthi.asc
>> > > >
>> > > >
>> > > > a) Verify that u can unpack the release (tar or zip)
>> > > > b) Verify u r able to compile the distro
>> > > > c) Run through the unit tests: mvn clean test
>> > > > d) Run the example scripts under
> $MAHOUT_HOME/examples/bin. Please
>> run
>> > > through all the different options in each script.
>> > > >
>> > > > Committers and PMC, need a minimum of 3 '+1' votes for the release
>> to be
>> > > finalized.
>> > >
>> > > --------------------------------------------
>> > > Grant Ingersoll | @gsingers
>> > > http://www.lucidworks.com
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> >
>> > --
>>
> > CONFIDENTIALITY NOTICE
>> > NOTICE: This message is intended for the use of the individual or entity
>> to
>> > which it is addressed and may contain information that is confidential,
>> > privileged and exempt from disclosure under applicable law. If the reader
>> > of this message is not the intended recipient, you are hereby notified
>> that
>> > any printing, copying, dissemination, distribution, disclosure or
>> > forwarding of this communication is strictly prohibited. If you have
>> > received this communication in error, please contact the sender
>> immediately
>> > and delete it from your system. Thank You.
>>
>>
Re: MAHOUT 0.9 Release - New URL
Posted by Ted Dunning <te...@gmail.com>.
Can we hold a separate discussion about whether the clustering id issue has
to be in 0.9 while extending the vote deadline if necessary?
If not, then all these votes are great and the release can go forward.
If it is the sense that that fix has to be in, we should leave time for
people for people to reverse their votes to -1.
On Fri, Jan 24, 2014 at 2:22 PM, Suneel Marthi <su...@yahoo.com>wrote:
> Thanks for all those that volunteered. The voting for 0.9 Release closes
> tomorrow.
>
>
>
>
>
>
>
> On Friday, January 24, 2014 4:05 AM, Gokhan Capan <gk...@gmail.com>
> wrote:
>
> Using CentOS 6.5 and hadoop 1.2.1, all passed.
>
> +1 from me
>
> Gokhan
>
>
>
> On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo <ap...@outlook.com>
> wrote:
>
> > a),b),c),d) all passed on CentOS for me
> >
> > > Date: Thu, 23 Jan 2014 13:43:06 +0200
> > > Subject: Re: MAHOUT 0.9 Release - New URL
> > > From: ssvinarchuk@hortonworks.com
> > > To: dev@mahout.apache.org
> > >
> > > I did a), b), c), d) and all steps pass.
> > > +1
> > >
> > >
> > > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll <gsingers@apache.org
> > >wrote:
> > >
> > > > +1 from me.
> > > >
> > > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi <su...@yahoo.com>
> > > > wrote:
> > > >
> > > > > Fixed the issues that were reported this week and restored FP
> mining
> > > > into the codebase.
> > > > >
> > > > > Here's the URL for the final release in staging:-
> > > > >
> > > >
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> > > > >
> > > > > The artifacts have been signed with the
> following key:
> > > > > https://people.apache.org/keys/committer/smarthi.asc
> > > > >
> > > > >
> > > > > a) Verify that u can unpack the release (tar or zip)
> > > > > b) Verify u r able to compile the distro
> > > > > c) Run through the unit tests: mvn clean test
> > > > > d) Run the example scripts under
> $MAHOUT_HOME/examples/bin. Please
> > run
> > > > through all the different options in each script.
> > > > >
> > > > > Committers and PMC, need a minimum of 3 '+1' votes for the release
> > to be
> > > > finalized.
> > > >
> > > > --------------------------------------------
> > > > Grant Ingersoll | @gsingers
> > > > http://www.lucidworks.com
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > > --
> >
> > CONFIDENTIALITY NOTICE
> > > NOTICE: This message is intended for the use of the individual or
> entity
> > to
> > > which it is addressed and may contain information that is confidential,
> > > privileged and exempt from disclosure under applicable law. If the
> reader
> > > of this message is not the intended recipient, you are hereby notified
> > that
> > > any printing, copying, dissemination, distribution, disclosure or
> > > forwarding of this communication is strictly prohibited. If you have
> > > received this communication in error, please contact the sender
> > immediately
> > > and delete it from your system. Thank You.
> >
> >
>
Re: MAHOUT 0.9 Release - New URL
Posted by Suneel Marthi <su...@yahoo.com>.
Thanks for all those that volunteered. The voting for 0.9 Release closes tomorrow.
On Friday, January 24, 2014 4:05 AM, Gokhan Capan <gk...@gmail.com> wrote:
Using CentOS 6.5 and hadoop 1.2.1, all passed.
+1 from me
Gokhan
On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo <ap...@outlook.com> wrote:
> a),b),c),d) all passed on CentOS for me
>
> > Date: Thu, 23 Jan 2014 13:43:06 +0200
> > Subject: Re: MAHOUT 0.9 Release - New URL
> > From: ssvinarchuk@hortonworks.com
> > To: dev@mahout.apache.org
> >
> > I did a), b), c), d) and all steps pass.
> > +1
> >
> >
> > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll <gsingers@apache.org
> >wrote:
> >
> > > +1 from me.
> > >
> > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi <su...@yahoo.com>
> > > wrote:
> > >
> > > > Fixed the issues that were reported this week and restored FP mining
> > > into the codebase.
> > > >
> > > > Here's the URL for the final release in staging:-
> > > >
> > >
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> > > >
> > > > The artifacts have been signed with the
following key:
> > > > https://people.apache.org/keys/committer/smarthi.asc
> > > >
> > > >
> > > > a) Verify that u can unpack the release (tar or zip)
> > > > b) Verify u r able to compile the distro
> > > > c) Run through the unit tests: mvn clean test
> > > > d) Run the example scripts under
$MAHOUT_HOME/examples/bin. Please
> run
> > > through all the different options in each script.
> > > >
> > > > Committers and PMC, need a minimum of 3 '+1' votes for the release
> to be
> > > finalized.
> > >
> > > --------------------------------------------
> > > Grant Ingersoll | @gsingers
> > > http://www.lucidworks.com
> > >
> > >
> > >
> > >
> > >
> > >
> >
> > --
>
> CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
>
>
Re: MAHOUT 0.9 Release - New URL
Posted by Gokhan Capan <gk...@gmail.com>.
Using CentOS 6.5 and hadoop 1.2.1, all passed.
+1 from me
Gokhan
On Thu, Jan 23, 2014 at 6:01 PM, Andrew Palumbo <ap...@outlook.com> wrote:
> a),b),c),d) all passed on CentOS for me
>
> > Date: Thu, 23 Jan 2014 13:43:06 +0200
> > Subject: Re: MAHOUT 0.9 Release - New URL
> > From: ssvinarchuk@hortonworks.com
> > To: dev@mahout.apache.org
> >
> > I did a), b), c), d) and all steps pass.
> > +1
> >
> >
> > On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll <gsingers@apache.org
> >wrote:
> >
> > > +1 from me.
> > >
> > > On Jan 22, 2014, at 5:55 PM, Suneel Marthi <su...@yahoo.com>
> > > wrote:
> > >
> > > > Fixed the issues that were reported this week and restored FP mining
> > > into the codebase.
> > > >
> > > > Here's the URL for the final release in staging:-
> > > >
> > >
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> > > >
> > > > The artifacts have been signed with the following key:
> > > > https://people.apache.org/keys/committer/smarthi.asc
> > > >
> > > >
> > > > a) Verify that u can unpack the release (tar or zip)
> > > > b) Verify u r able to compile the distro
> > > > c) Run through the unit tests: mvn clean test
> > > > d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please
> run
> > > through all the different options in each script.
> > > >
> > > > Committers and PMC, need a minimum of 3 '+1' votes for the release
> to be
> > > finalized.
> > >
> > > --------------------------------------------
> > > Grant Ingersoll | @gsingers
> > > http://www.lucidworks.com
> > >
> > >
> > >
> > >
> > >
> > >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
>
>
RE: MAHOUT 0.9 Release - New URL
Posted by Andrew Palumbo <ap...@outlook.com>.
a),b),c),d) all passed on CentOS for me
> Date: Thu, 23 Jan 2014 13:43:06 +0200
> Subject: Re: MAHOUT 0.9 Release - New URL
> From: ssvinarchuk@hortonworks.com
> To: dev@mahout.apache.org
>
> I did a), b), c), d) and all steps pass.
> +1
>
>
> On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll <gs...@apache.org>wrote:
>
> > +1 from me.
> >
> > On Jan 22, 2014, at 5:55 PM, Suneel Marthi <su...@yahoo.com>
> > wrote:
> >
> > > Fixed the issues that were reported this week and restored FP mining
> > into the codebase.
> > >
> > > Here's the URL for the final release in staging:-
> > >
> > https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> > >
> > > The artifacts have been signed with the following key:
> > > https://people.apache.org/keys/committer/smarthi.asc
> > >
> > >
> > > a) Verify that u can unpack the release (tar or zip)
> > > b) Verify u r able to compile the distro
> > > c) Run through the unit tests: mvn clean test
> > > d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> > through all the different options in each script.
> > >
> > > Committers and PMC, need a minimum of 3 '+1' votes for the release to be
> > finalized.
> >
> > --------------------------------------------
> > Grant Ingersoll | @gsingers
> > http://www.lucidworks.com
> >
> >
> >
> >
> >
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
Re: MAHOUT 0.9 Release - New URL
Posted by Sergey Svinarchuk <ss...@hortonworks.com>.
I did a), b), c), d) and all steps pass.
+1
On Thu, Jan 23, 2014 at 1:40 PM, Grant Ingersoll <gs...@apache.org>wrote:
> +1 from me.
>
> On Jan 22, 2014, at 5:55 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> > Fixed the issues that were reported this week and restored FP mining
> into the codebase.
> >
> > Here's the URL for the final release in staging:-
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> >
> > The artifacts have been signed with the following key:
> > https://people.apache.org/keys/committer/smarthi.asc
> >
> >
> > a) Verify that u can unpack the release (tar or zip)
> > b) Verify u r able to compile the distro
> > c) Run through the unit tests: mvn clean test
> > d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> through all the different options in each script.
> >
> > Committers and PMC, need a minimum of 3 '+1' votes for the release to be
> finalized.
>
> --------------------------------------------
> Grant Ingersoll | @gsingers
> http://www.lucidworks.com
>
>
>
>
>
>
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
Re: MAHOUT 0.9 Release - New URL
Posted by Grant Ingersoll <gs...@apache.org>.
+1 from me.
On Jan 22, 2014, at 5:55 PM, Suneel Marthi <su...@yahoo.com> wrote:
> Fixed the issues that were reported this week and restored FP mining into the codebase.
>
> Here's the URL for the final release in staging:-
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
>
> The artifacts have been signed with the following key:
> https://people.apache.org/keys/committer/smarthi.asc
>
>
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c) Run through the unit tests: mvn clean test
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
>
> Committers and PMC, need a minimum of 3 '+1' votes for the release to be finalized.
--------------------------------------------
Grant Ingersoll | @gsingers
http://www.lucidworks.com
Re: MAHOUT 0.9 Release - New URL
Posted by Andrew Musselman <an...@gmail.com>.
Likewise, a) through d) work on an Amazon AMI and Ubuntu 12.04.
+1
On Wed, Jan 22, 2014 at 6:38 PM, Suneel Marthi <su...@yahoo.com>wrote:
> Same here. I did a), b), c) and d) too and all tests pass. Here's my +1,
> if my vote counts.
>
>
>
>
>
> On Wednesday, January 22, 2014 7:11 PM, Sebastian Schelter <ss...@apache.org>
> wrote:
>
> I did a) b) c) and d) without noting any problem so far. +1 from me.
>
> --sebastian
>
>
>
> On 01/22/2014 11:55 PM, Suneel Marthi wrote:
> > Fixed the issues that were reported this week and restored FP mining
> into the codebase.
> >
> > Here's the URL for the final release in staging:-
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
> >
> > The artifacts have been signed with the following key:
> > https://people.apache.org/keys/committer/smarthi.asc
> >
> >
> > a) Verify that u can unpack the release (tar or zip)
> > b) Verify u r able to compile the distro
> > c) Run through the unit tests: mvn clean test
> > d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run
> through all the different options in each script.
> >
> > Committers and PMC, need a minimum of 3 '+1' votes for the release to be
> finalized.
> >
>
Re: MAHOUT 0.9 Release - New URL
Posted by Suneel Marthi <su...@yahoo.com>.
Same here. I did a), b), c) and d) too and all tests pass. Here's my +1, if my vote counts.
On Wednesday, January 22, 2014 7:11 PM, Sebastian Schelter <ss...@apache.org> wrote:
I did a) b) c) and d) without noting any problem so far. +1 from me.
--sebastian
On 01/22/2014 11:55 PM, Suneel Marthi wrote:
> Fixed the issues that were reported this week and restored FP mining into the codebase.
>
> Here's the URL for the final release in staging:-
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
>
> The artifacts have been signed with the following key:
> https://people.apache.org/keys/committer/smarthi.asc
>
>
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c) Run through the unit tests: mvn clean test
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
>
> Committers and PMC, need a minimum of 3 '+1' votes for the release to be finalized.
>
Re: MAHOUT 0.9 Release - New URL
Posted by Sebastian Schelter <ss...@apache.org>.
I did a) b) c) and d) without noting any problem so far. +1 from me.
--sebastian
On 01/22/2014 11:55 PM, Suneel Marthi wrote:
> Fixed the issues that were reported this week and restored FP mining into the codebase.
>
> Here's the URL for the final release in staging:-
> https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
>
> The artifacts have been signed with the following key:
> https://people.apache.org/keys/committer/smarthi.asc
>
>
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c) Run through the unit tests: mvn clean test
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
>
> Committers and PMC, need a minimum of 3 '+1' votes for the release to be finalized.
>
Re: MAHOUT 0.9 Release - New URL
Posted by Suneel Marthi <su...@yahoo.com>.
Fixed the issues that were reported this week and restored FP mining into the codebase.
Here's the URL for the final release in staging:-
https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
The artifacts have been signed with the following key:
https://people.apache.org/keys/committer/smarthi.asc
a) Verify that u can unpack the release (tar or zip)
b) Verify u r able to compile the distro
c) Run through the unit tests: mvn clean test
d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
Committers and PMC, need a minimum of 3 '+1' votes for the release to be finalized.
Re: MAHOUT 0.9 Release - New URL
Posted by Suneel Marthi <su...@yahoo.com>.
Fixed the issues that were reported this week and restored FP mining into the codebase.
Here's the URL for the final release in staging:-
https://repository.apache.org/content/repositories/orgapachemahout-1003/org/apache/mahout/mahout-distribution/0.9/
The artifacts have been signed with the following key:
https://people.apache.org/keys/committer/smarthi.asc
a) Verify that u can unpack the release (tar or zip)
b) Verify u r able to compile the distro
c) Run through the unit tests: mvn clean test
d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
Committers and PMC, need a minimum of 3 '+1' votes for the release to be finalized.
Re: MAHOUT 0.9 Release - New URL
Posted by Frank Scholten <fr...@frankscholten.nl>.
Updated trunk and Streaming K-Means works in sequential mode:
Average distance in cluster 0 [45]: 15835.645959
Average distance in cluster 1 [2]: 12655.384293
Cluster 2 is has 1 data point. Need atleast 2 data points in a cluster for
OnlineSummarizer.
Average distance in cluster 3 [12]: 16639.304306
Average distance in cluster 4 [12466]: 1765.051250
Average distance in cluster 5 [613]: 7968.987864
Average distance in cluster 6 [453]: 11678.351990
Average distance in cluster 7 [7848]: 3475.257237
Average distance in cluster 8 [137]: 14040.611024
Cluster 9 is has 1 data point. Need atleast 2 data points in a cluster for
OnlineSummarizer.
Num clusters: 10; maxDistance: 111156.247816
[Dunn Index] First: 0.002786
[Davies-Bouldin Index] First: 53.915866
Jan 22, 2014 11:29:51 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 33654 ms (Minutes: 0.5609)
cluster,distance.mean,distance.sd
,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
0,15835.645959,4305.418183,-6094.066658,13494.967198,14908.353662,19013.936496,25229.795662,45,train
1,12655.384293,12655.384293,-12655.384293,0.000000,12655.384293,25310.768587,37966.152880,2,train
3,16639.304306,8137.858191,76.596986,23652.805218,17032.177174,21993.360359,26116.861135,12,train
4,1765.051250,1912.041786,53.833129,665.221968,1398.456928,2116.252442,91200.149803,12466,train
5,7968.987864,3283.509392,3106.173001,5444.653631,7154.854277,9475.107969,20961.123807,613,train
6,11678.351990,3986.046231,80.428556,8688.530291,10657.331417,13992.879084,25697.590999,453,train
7,3475.257237,2849.263422,244.613872,1701.937225,2645.839526,4362.384712,111156.247816,7848,train
8,14040.611024,3847.956007,-4400.223235,11295.103900,13063.847142,16227.853884,22973.712042,137,train
On Wed, Jan 22, 2014 at 10:45 PM, Suneel Marthi <su...@yahoo.com>wrote:
> Thanks Andrew. I'll put a Release out soon.
>
>
>
>
> On Wednesday, January 22, 2014 3:52 PM, Andrew Palumbo <ap...@outlook.com>
> wrote:
>
>
> Everything seems to run well on my local machine:
>
> Checked out revision 1560364.
>
> CentOS 6
> Apache Maven 3.1.2-SNAPSHOT
> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> Java home: /usr/java/jdk1.6.0_45/jre
> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64",
> family: "unix"
> Hadoop 2.2.0
>
>
> mvn clean compile -DSkipTests [OK-Several Warnings]
> mvn clean test [PASSED ALL]
> mvn clean install -DskipTests [OK]
>
>
> $MAHOUT_LOCAL=true
>
> classify-20newsgroups.sh->1 [Accuracy 89.3529%]
> classify-20newsgroups.sh->2 [Accuracy 90.8317%]
> classify-20newsgroups.sh->3 [Accuracy 76.2746%]
> classify-20newsgroups.sh->4 [cleans up]
>
> cluster-reuters.sh->1 [20 clusters] -kmeans
> cluster-reuters.sh->2 [INFO: 20 clusters] -fkmeans
> cluster-reuters.sh->3 [OK] -lda
> cluster-reuters.sh->4 [10 (9) clusters- see attached] -streaming kmeans
>
> ./cluster-syntheticcontrol.sh->1 [INFO: Wrote 6 clusters]
> ./cluster-syntheticcontrol.sh->2 [INFO: Wrote 6 clusters]
> ./cluster-syntheticcontrol.sh->3 [INFO: Wrote 6 clusters]
>
> ./factorize-movielens-1M.sh /home/andy/test_data/ml-1m/ratings.dat [RMSE
> is: 0.851264570339848]
>
>
>
>
> Attached is full output of cluster-reuters.sh->4 Streaming K-Means.
>
>
>
> From cluster-reuters.sh->4 Streaming K-Means:
>
> Cluster 0 is has 1 data point. Need atleast 2 data points in a cluster for
> OnlineSummarizer.
> Average distance in cluster 1 [2816]: 3438.913758
> Average distance in cluster 2 [112]: 20617.345993
> Average distance in cluster 3 [4]: 32504.085379
> Average distance in cluster 4 [435]: 18476.579935
> Average distance in cluster 5 [27]: 21153.167574
> Average distance in cluster 6 [15480]: 2040.864416
> Average distance in cluster 7 [1711]: 5281.742482
> Average distance in cluster 8 [964]: 15762.976239
> Average distance in cluster 9 [28]: 19762.109632
> Num clusters: 10; maxDistance: 107106.379648
>
>
>
>
> [Dunn Index] First: 0.002272
> [Davies-Bouldin Index] First: 57.871266
> Jan 22, 2014 12:14:47 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 77695 ms (Minutes: 1.2949166666666667)
> cluster,distance.mean,distance.sd
> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> 1,3438.913758
> ,2430.072640,250.635051,1793.254765,2908.356638,4444.702564,22173.892767,2816,train
>
> 2,20617.345993,3978.577827,-8306.835555,17787.685767,19584.319120,22864.637511,37305.829397,112,train
>
> 3,32504.085379,29250.558538,-12174.296092,12174.296092,36522.888276,26372.137172,107106.379648,4,train
>
> 4,18476.579935,3600.742072,-7212.729374,15841.995992,17431.838259,20066.610494,40205.090209,435,train
>
> 5,21153.167574,4963.661797,-8880.583978,19729.348269,21251.400944,24588.743549,27926.248558,27,train
>
> 6,2040.864416,2007.719699,53.622493,841.033934,1571.121917,2396.407672,18967.768820,15480,train
>
> 7,5281.742482,3083.071478,1933.759989,3216.929268,4074.689928,6371.577109,20292.193673,1711,train
>
> 8,15762.976239,3158.956443,65.031208,13511.867700,14744.029626,17287.006957,31483.809655,964,train
>
> 9,19762.109632,4355.120345,-8902.814641,18669.317253,20712.227220,21602.660490,27452.910312,28,train
>
>
>
>
>
> > From: ap.dev@outlook.com
> > To: dev@mahout.apache.org; user@mahout.apache.org
> > Subject: RE: MAHOUT 0.9 Release - New URL
> > Date: Wed, 22 Jan 2014 09:37:06 -0500
> >
> > will do!
> >
> > > Date: Wed, 22 Jan 2014 01:24:05 -0800
> > > From: suneel_marthi@yahoo.com
> > > Subject: Re: MAHOUT 0.9 Release - New URL
> > > To: dev@mahout.apache.org; user@mahout.apache.org
> > >
> > > Andrew M., Andrew P. and others,
> > >
> > > Sebastian and me fixed a few issues today (for 0.9):
> > >
> > > a) Removed asf-email-examples.sh script and few other scripts that
> should have been removed. Also removed references/invocations to algorithms
> that have been removed from the codebase.
> > > b) Fixed the issue with Streaming Kmeans clustering and checked in the
> code.
> > > c) Resurrected Frequent Pattern Mining implementation for 0.9.
> > >
> > > Please checkout the latest code from trunk, run a build locally and
> run thru the example scripts.
> > >
> > > Thanks and Regards.
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Wednesday, January 22, 2014 12:11 AM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
> > >
> > > *factorize-movielens-1M.sh:*
> > > RMSE is:
> > >
> > > 0.8519064098265133
> > >
> > >
> > > Sample recommendations:
> > >
> > > 2229
> > >
> [2197:4.4961276,527:4.4309845,2972:4.4180074,318:4.379484,572:4.312696,3092:4.249903]
> > > 5848
> > >
> [1900:4.6775646,3787:4.6623707,632:4.641377,2609:4.608225,3808:4.6058936,2998:4.6057487]
> > > 3728
> > >
> [572:4.951382,47:4.748921,874:4.6945343,1563:4.679901,3314:4.6621537,50:4.655838]
> > > 1252
> > >
> [53:5.0,3816:4.9664702,3077:4.9494777,213:4.94007,3808:4.9060082,978:4.8568053]
> > > 634
> > >
> [572:5.0,3092:4.779557,1872:4.72024,2687:4.629712,2125:4.615142,3853:4.5261393]
> > > 5516
> [572:5.0,2197:5.0,3092:5.0,318:4.908213,356:4.885,3844:4.8237453]
> > > 2276 [1204:5.0,572:5.0,912:5.0,1250:5.0,1272:4.999891,1262:4.989652]
> > > 4219
> > >
> [53:4.8112006,598:4.775032,858:4.761604,572:4.7579737,1219:4.680987,1221:4.6604886]
> > > 91 [1198:5.0,2762:5.0,1207:5.0,1234:5.0,318:5.0,260:5.0]
> > > 502
> > >
> [953:5.0,260:4.9800477,1234:4.869403,1198:4.8527064,1207:4.8497486,3469:4.847286]
> > >
> > > factorize-netflix.sh:
> > > References a no-longer-available data set that Netflix took down after
> the
> > > competition; should at least mention that the data set is no longer
> > > "online" at least.
> > >
> > >
> > > On Tue, Jan 21, 2014 at 8:05 PM, Andrew Musselman <
> > > andrew.musselman@gmail.com> wrote:
> > >
> > > > *clustering-syntheticcontrol.sh*
> > > >
> > > > *Canopy:*
> > > > [snip]
> > > > 1.0 : [distance-squared=1740.681000315628]: [35.486, 25.600,
> > > > 29.914, 30.200, 27.654, 28.647, 26.582, 32.941, 34.728, 29.047,
> 34.706,
> > > > 34.764, 28.816, 30.271, 25.784, 25.035, 35.436, 29.075, 24.267,
> 24.625,
> > > > 28.162, 26.218, 28.773, 28.966, 28.802, 34.292, 26.211, 33.363,
> 32.920,
> > > > 31.891, 34.504, 32.686, 24.327, 35.981, 31.390, 10.832, 20.238,
> 10.051,
> > > > 14.877, 10.570, 19.603, 14.544, 10.667, 16.470, 19.007, 10.352,
> 13.473,
> > > > 12.196, 10.684, 16.620, 20.434, 17.069, 18.744, 9.599, 11.195,
> 12.002,
> > > > 10.017, 17.149, 14.850, 10.890]
> > > > 1.0 : [distance-squared=1455.363773097357]: [31.022, 28.140,
> > > > 26.730, 26.570, 29.561, 26.966, 28.049, 25.673, 33.721, 26.275,
> 30.410,
> > > > 31.101, 24.019, 35.659, 25.253, 25.932, 28.618, 32.423, 33.666,
> 33.745,
> > > > 35.118, 29.164, 25.477, 31.947, 35.491, 30.730, 25.820, 24.651,
> 25.528,
> > > > 31.343, 29.005, 31.825, 26.891, 28.194, 31.429, 16.935, 8.070,
> 16.604,
> > > > 14.743, 10.342, 8.155, 10.395, 17.689, 16.791, 14.138, 15.761, 6.787,
> > > > 13.062, 16.660, 15.021, 9.891, 9.216, 11.550, 8.877, 18.220, 9.477,
> 10.342,
> > > > 16.430, 11.898, 15.366]
> > > > 1.0 : [distance-squared=1679.9304895378882]: [29.625, 25.503,
> > > > 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873,
> 31.817,
> > > > 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472,
> 32.322,
> > > > 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231,
> 18.264,
> > > > 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463,
> 22.296,
> > > > 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025,
> 21.750,
> > > > 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310,
> 19.136,
> > > > 15.285, 22.528, 20.657, 24.129]
> > > > 1.0 : [distance-squared=2044.2887801683828]: [27.414, 25.397,
> > > > 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686,
> 27.511,
> > > > 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607,
> 33.519,
> > > > 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457,
> 35.025,
> > > > 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773,
> 11.549,
> > > > 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639,
> 17.236,
> > > > 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583,
> 14.118,
> > > > 20.229, 11.131, 9.980, 10.720]
> > > > 1.0 : [distance-squared=1385.3154063160764]: [35.899, 26.672,
> > > > 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587,
> 31.032,
> > > > 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037,
> 32.979,
> > > > 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016,
> 24.553,
> > > > 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533,
> 21.542,
> > > > 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836,
> 21.939,
> > > > 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534,
> 16.846,
> > > > 16.546, 15.927, 18.084, 17.475]
> > > > 1.0 : [distance-squared=1920.6376615603585]: [24.538, 24.280,
> > > > 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387,
> 31.301,
> > > > 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657,
> 25.295,
> > > > 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349,
> 18.137,
> > > > 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645,
> 19.457,
> > > > 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915,
> 13.762,
> > > > 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235,
> 12.042,
> > > > 19.310, 12.999, 17.460]
> > > > 1.0 : [distance-squared=2192.939571172661]: [34.335, 30.938,
> > > > 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089,
> 31.371,
> > > > 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650,
> 24.940,
> > > > 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634,
> 12.694,
> > > > 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729,
> 6.976,
> > > > 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> > > > 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> > > > 11.743, 11.699, 10.152]
> > > > Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> > > > INFO: Wrote 6 clusters
> > > > Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> > > > INFO: Program took 5827 ms (Minutes: 0.09711666666666667)
> > > >
> > > > *K-means:*
> > > > [snip]
> > > > 1.0 : [distance-squared=2873.881301031739]: [26.369, 37.791,
> > > > 41.839, 39.694, 36.728, 35.079, 30.668, 24.755, 20.610, 18.885,
> 15.459,
> > > > 14.074, 19.117, 34.230, 32.202, 40.715, 39.543, 37.193, 39.448,
> 30.829,
> > > > 25.273, 22.324, 19.801, 13.505, 17.462, 24.287, 27.398, 38.577,
> 42.108,
> > > > 41.947, 43.987, 41.331, 28.958, 23.664, 20.308, 22.244, 12.149,
> 15.768,
> > > > 18.964, 32.579, 33.202, 36.205, 42.364, 40.601, 35.415, 39.576,
> 33.145,
> > > > 19.936, 23.062, 19.053, 24.383, 19.611, 25.218, 38.860, 36.570,
> 38.964,
> > > > 36.946, 36.900, 32.593, 31.563]
> > > > 1.0 : [distance-squared=2525.8924241648783]: [35.389, 31.178,
> > > > 40.041, 43.034, 49.524, 40.942, 42.369, 30.153, 26.253, 23.178,
> 19.885,
> > > > 19.693, 21.837, 26.858, 23.533, 29.798, 43.401, 43.349, 48.238,
> 43.868,
> > > > 44.017, 28.056, 25.284, 22.737, 13.703, 14.023, 22.115, 23.720,
> 25.478,
> > > > 37.870, 37.868, 46.954, 43.135, 41.286, 37.269, 29.347, 24.312,
> 21.743,
> > > > 19.152, 21.668, 10.882, 16.147, 30.020, 28.472, 38.716, 44.620,
> 47.847,
> > > > 48.846, 39.361, 38.449, 33.105, 19.935, 14.961, 12.149, 12.630,
> 13.459,
> > > > 18.967, 35.473, 30.146, 45.527]
> > > > 1.0 : [distance-squared=2392.7171990886272]: [27.662, 37.199,
> > > > 39.158, 44.264, 46.473, 40.178, 38.728, 24.412, 25.155, 15.938,
> 13.125,
> > > > 16.867, 13.875, 29.969, 34.322, 40.870, 44.225, 47.594, 47.607,
> 44.751,
> > > > 39.813, 32.461, 16.646, 16.305, 21.256, 20.627, 19.267, 31.901,
> 34.995,
> > > > 43.643, 43.152, 47.125, 48.524, 33.131, 32.697, 20.300, 14.350,
> 18.330,
> > > > 12.155, 11.261, 19.701, 29.313, 30.457, 39.914, 49.543, 49.851,
> 46.445,
> > > > 39.561, 28.860, 22.486, 13.967, 10.006, 11.715, 15.945, 21.348,
> 25.439,
> > > > 41.024, 37.105, 45.623, 43.589]
> > > > 1.0 : [distance-squared=1419.8378244373016]: [25.784, 34.129,
> > > > 42.659, 37.176, 35.961, 34.307, 32.108, 29.749, 25.047, 17.455,
> 24.652,
> > > > 25.311, 22.995, 30.256, 25.955, 28.426, 34.556, 40.386, 39.642,
> 40.566,
> > > > 32.612, 34.091, 26.033, 25.668, 26.545, 17.338, 24.980, 20.134,
> 27.542,
> > > > 36.612, 31.855, 37.947, 39.736, 33.535, 36.607, 37.479, 32.612,
> 22.262,
> > > > 20.662, 16.124, 24.547, 27.686, 21.747, 27.198, 31.259, 40.569,
> 37.067,
> > > > 34.465, 34.730, 33.371, 23.060, 30.162, 22.022, 22.216, 14.812,
> 19.357,
> > > > 24.508, 34.432, 32.155, 34.839]
> > > > 1.0 : [distance-squared=4186.814512311335]: [25.870, 39.195,
> > > > 36.908, 47.052, 47.384, 40.741, 42.494, 30.282, 25.834, 17.650,
> 16.004,
> > > > 17.895, 13.321, 19.045, 27.440, 31.911, 39.208, 43.622, 41.567,
> 44.815,
> > > > 44.921, 35.422, 35.477, 23.190, 17.859, 14.684, 23.504, 23.141,
> 21.746,
> > > > 30.816, 31.361, 37.015, 38.094, 46.688, 47.681, 43.777, 39.652,
> 31.701,
> > > > 23.767, 22.265, 22.654, 22.327, 19.195, 21.163, 29.602, 27.563,
> 36.244,
> > > > 38.859, 44.234, 42.352, 42.160, 40.172, 30.094, 21.092, 25.193,
> 13.096,
> > > > 18.111, 14.754, 27.386, 27.026]
> > > > 1.0 : [distance-squared=1544.4011543572997]: [28.075, 41.784,
> > > > 42.120, 38.735, 44.320, 34.316, 32.212, 31.868, 24.301, 14.547,
> 17.178,
> > > > 22.279, 24.357, 31.011, 31.444, 34.837, 46.550, 48.301, 38.859,
> 42.363,
> > > > 35.657, 31.499, 19.794, 12.124, 15.371, 12.436, 15.763, 24.679,
> 32.597,
> > > > 43.004, 36.616, 38.935, 42.954, 34.957, 36.183, 28.177, 16.326,
> 14.988,
> > > > 10.680, 22.728, 24.075, 24.058, 36.616, 43.982, 39.198, 40.118,
> 40.078,
> > > > 34.752, 34.018, 23.750, 18.374, 12.251, 15.539, 18.699, 28.973,
> 24.044,
> > > > 39.404, 38.034, 46.458, 44.432]
> > > > 1.0 : [distance-squared=825.9338725427806]: [33.670, 38.675,
> > > > 39.742, 41.989, 37.291, 43.975, 31.909, 25.878, 31.080, 15.858,
> 13.950,
> > > > 23.097, 19.983, 21.692, 31.579, 38.570, 33.376, 38.843, 41.936,
> 33.534,
> > > > 39.195, 32.897, 25.343, 18.523, 15.089, 17.771, 22.614, 25.313,
> 23.687,
> > > > 29.010, 41.995, 35.712, 40.872, 41.669, 32.156, 25.162, 24.980,
> 23.705,
> > > > 18.413, 20.975, 14.906, 26.171, 30.165, 27.818, 35.083, 39.514,
> 37.851,
> > > > 33.967, 32.338, 34.977, 26.589, 28.079, 19.597, 24.669, 23.098,
> 25.685,
> > > > 28.215, 34.940, 36.910, 39.749]
> > > > Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> > > > INFO: Wrote 6 clusters
> > > > Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> > > > INFO: Program took 16902 ms (Minutes: 0.2817)
> > > >
> > > > *Fuzzy k-means:*
> > > > [snip]
> > > > 1.0 : [distance-squared=971.7369782121968]: [29.625, 25.503,
> > > > 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873,
> 31.817,
> > > > 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472,
> 32.322,
> > > > 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231,
> 18.264,
> > > > 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463,
> 22.296,
> > > > 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025,
> 21.750,
> > > > 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310,
> 19.136,
> > > > 15.285, 22.528, 20.657, 24.129]
> > > > 1.0 : [distance-squared=2054.618163154475]: [27.414, 25.397,
> > > > 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686,
> 27.511,
> > > > 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607,
> 33.519,
> > > > 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457,
> 35.025,
> > > > 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773,
> 11.549,
> > > > 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639,
> 17.236,
> > > > 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583,
> 14.118,
> > > > 20.229, 11.131, 9.980, 10.720]
> > > > 1.0 : [distance-squared=954.6503560728597]: [35.899, 26.672,
> > > > 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587,
> 31.032,
> > > > 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037,
> 32.979,
> > > > 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016,
> 24.553,
> > > > 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533,
> 21.542,
> > > > 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836,
> 21.939,
> > > > 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534,
> 16.846,
> > > > 16.546, 15.927, 18.084, 17.475]
> > > > 1.0 : [distance-squared=2817.9170498632957]: [24.538, 24.280,
> > > > 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387,
> 31.301,
> > > > 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657,
> 25.295,
> > > > 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349,
> 18.137,
> > > > 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645,
> 19.457,
> > > > 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915,
> 13.762,
> > > > 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235,
> 12.042,
> > > > 19.310, 12.999, 17.460]
> > > > 1.0 : [distance-squared=3472.3684696871424]: [34.335, 30.938,
> > > > 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089,
> 31.371,
> > > > 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650,
> 24.940,
> > > > 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634,
> 12.694,
> > > > 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729,
> 6.976,
> > > > 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> > > > 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> > > > 11.743, 11.699, 10.152]
> > > > Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> > > > INFO: Wrote 6 clusters
> > > > Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> > > > INFO: Program took 106615 ms (Minutes: 1.7769166666666667)
> > > >
> > > > *Dirichlet and Meanshift:*
> > > > Already detailed in M-1400, deprecated jobs still referenced.
> > > >
> > > >
> > > >
> > > > On Tue, Jan 21, 2014 at 6:20 PM, Andrew Musselman <
> > > > andrew.musselman@gmail.com> wrote:
> > > >
> > > >> *cluster-reuters.sh*
> > > >> *k-means:*
> > > >>
> > > >> [snip]
> > > >> :VL-19482{n=913 c=[0.06:0.011, 0.1:0.007, 0.13:0.010, 0.25:0.016,
> > > >> 0.38:0.020, 0.4:0.007, 0.5:0.032, 0
> > > >> Top Terms:
> > > >> banks =>
> > > >> 3.841823268955143
> > > >> bank =>
> > > >> 3.80633066361209
> > > >> debt =>
> > > >> 3.28065219870794
> > > >> said =>
> > > >> 2.5965700942088583
> > > >> he =>
> > > >> 2.335682813857497
> > > >> foreign =>
> > > >> 2.2217853688201403
> > > >> billion =>
> > > >> 2.1970193848291335
> > > >> would =>
> > > >> 1.9932392063955617
> > > >> loans =>
> > > >> 1.9309276792854233
> > > >> interest =>
> > > >> 1.787324501938
> > > >> have =>
> > > >> 1.762981951432578
> > > >> its =>
> > > >> 1.7615109954971866
> > > >> which =>
> > > >> 1.5822081148036862
> > > >> has =>
> > > >> 1.5600708189041956
> > > >> dlrs =>
> > > >> 1.5571038313005996
> > > >> finance =>
> > > >> 1.5539758811252924
> > > >> new =>
> > > >> 1.5176015811577555
> > > >> had =>
> > > >> 1.5138723701401844
> > > >> brazil =>
> > > >> 1.5083369853593172
> > > >> payments =>
> > > >> 1.4539044255886517
> > > >> Weight : [props - optional]: Point:
> > > >>
> > > >> :VL-7320{n=2726 c=[0:0.003, 0.1:0.010, 0.2:0.007, 0.3:0.009,
> 0.4:0.007,
> > > >> 0.40:0.003, 0.5:0.009, 0.57:0
> > > >> Top Terms:
> > > >> vs =>
> > > >> 6.126130791333171
> > > >> net =>
> > > >> 4.012191567277523
> > > >> cts =>
> > > >> 3.822006848832744
> > > >> shr =>
> > > >> 3.6786004856764527
> > > >> mln =>
> > > >> 2.9011643584038698
> > > >> loss =>
> > > >> 2.788368861463607
> > > >> qtr =>
> > > >> 2.714140225051522
> > > >> revs =>
> > > >> 2.4739861236454717
> > > >> profit =>
> > > >> 1.8146888090247015
> > > >> note =>
> > > >> 1.7977163272138388
> > > >> dlrs =>
> > > >> 1.6164390808155846
> > > >> avg =>
> > > >> 1.3901765773336587
> > > >> shrs =>
> > > >> 1.3856326531419314
> > > >> mths =>
> > > >> 1.3168717272038506
> > > >> 4th =>
> > > >> 1.2161158425617289
> > > >> oper =>
> > > >> 1.182419473776814
> > > >> year =>
> > > >> 1.178086061733047
> > > >> nine =>
> > > >> 1.0670554836445316
> > > >> 3rd =>
> > > >> 1.041334410056592
> > > >> inc =>
> > > >> 1.0019361981554935
> > > >> Weight : [props - optional]: Point:
> > > >>
> > > >>
> > > >> Inter-Cluster Density: 0.45562152681859414
> > > >> Intra-Cluster Density: 0.6952712632167628
> > > >> CDbw Inter-Cluster Density: 0.0
> > > >> CDbw Intra-Cluster Density: 16.486930227598684
> > > >> CDbw Separation: 194.49005884464628
> > > >>
> > > >> *fuzzy k-means:*
> > > >> :SV-18539{n=1039 c=[0:0.026, 0.003:0.001, 0.006913:0.001,
> 0.007050:0.001,
> > > >> 0.01:0.005, 0.02:0.002, 0.0
> > > >> Top Terms:
> > > >> said =>
> > > >> 1.8665592354713065
> > > >> its =>
> > > >> 1.1335212213411592
> > > >> pct =>
> > > >> 1.0862816801353348
> > > >> dlrs =>
> > > >> 1.0854998884993752
> > > >> mln =>
> > > >> 1.043163996400643
> > > >> from =>
> > > >> 0.9684961110525736
> > > >> has =>
> > > >> 0.912161511978058
> > > >> company =>
> > > >> 0.8754186972808333
> > > >> mar =>
> > > >> 0.8675333452422878
> > > >> inc =>
> > > >> 0.7678617590362815
> > > >> would =>
> > > >> 0.7610968883652675
> > > >> he =>
> > > >> 0.7459988770503974
> > > >> which =>
> > > >> 0.7435613119406804
> > > >> year =>
> > > >> 0.7302840632748394
> > > >> u.s =>
> > > >> 0.7281061062439116
> > > >> shares =>
> > > >> 0.7260764102983083
> > > >> corp =>
> > > >> 0.7179807367808658
> > > >> new =>
> > > >> 0.7044203783157115
> > > >> stock =>
> > > >> 0.6962010978721442
> > > >> have =>
> > > >> 0.6464265467298506
> > > >> :SV-9431{n=1034 c=[0:0.023, 0.003:0.001, 0.006913:0.001,
> 0.007050:0.001,
> > > >> 0.01:0.004, 0.02:0.002, 0.02
> > > >> Top Terms:
> > > >> said =>
> > > >> 1.864911184196927
> > > >> dlrs =>
> > > >> 1.199286689822081
> > > >> mln =>
> > > >> 1.1802134783562215
> > > >> pct =>
> > > >> 1.1529704214798124
> > > >> its =>
> > > >> 1.1184398851519701
> > > >> from =>
> > > >> 1.016647848050332
> > > >> company =>
> > > >> 0.894703604722841
> > > >> mar =>
> > > >> 0.879986159541356
> > > >> has =>
> > > >> 0.8642799128491316
> > > >> year =>
> > > >> 0.8271823503717782
> > > >> inc =>
> > > >> 0.7871293745341424
> > > >> corp =>
> > > >> 0.737705498468879
> > > >> which =>
> > > >> 0.722975201852743
> > > >> would =>
> > > >> 0.708000816484415
> > > >> u.s =>
> > > >> 0.7073294276173905
> > > >> billion =>
> > > >> 0.7055723996916351
> > > >> he =>
> > > >> 0.7042684217823294
> > > >> new =>
> > > >> 0.6834737905434939
> > > >> shares =>
> > > >> 0.6753327384172428
> > > >> stock =>
> > > >> 0.6576225144041699
> > > >> :SV-4785{n=1044 c=[0:0.023, 0.003:0.001, 0.006913:0.001,
> 0.007050:0.001,
> > > >> 0.01:0.006, 0.02:0.002, 0.02
> > > >> Top Terms:
> > > >> said =>
> > > >> 1.8796076179735086
> > > >> its =>
> > > >> 1.172025965452378
> > > >> dlrs =>
> > > >> 1.130422792460914
> > > >> pct =>
> > > >> 1.082038255241358
> > > >> mln =>
> > > >> 1.0772146872767114
> > > >> company =>
> > > >> 0.9662235879639138
> > > >> from =>
> > > >> 0.9473172871605616
> > > >> has =>
> > > >> 0.9224712965830099
> > > >> mar =>
> > > >> 0.8769325856924421
> > > >> inc =>
> > > >> 0.8360245257169788
> > > >> shares =>
> > > >> 0.8334595641384324
> > > >> stock =>
> > > >> 0.7704621839612175
> > > >> corp =>
> > > >> 0.7682400250301806
> > > >> which =>
> > > >> 0.7389988207856137
> > > >> would =>
> > > >> 0.7339708917389389
> > > >> year =>
> > > >> 0.7088414843731325
> > > >> new =>
> > > >> 0.7038109468655172
> > > >> he =>
> > > >> 0.6993994455501005
> > > >> u.s =>
> > > >> 0.6772649147622415
> > > >> share =>
> > > >> 0.6241804830055171
> > > >>
> > > >> *lda:*
> > > >>
> > > >> [snip]
> > > >> 21539
> > > >>
> {0.02:0.7071698119320297,0.01:0.09185661419250732,0.055:0.05559716236030052,0:0.04416624482186779,0.046:0.04110903741109505,0.10:0.036664417789754995,0.073:0.009543544335363714,0.006913:0.0050293341354450535,0.007050:0.004031353935388081,0.003:0.0019163134919350053}
> > > >> 21540
> > > >>
> {0.04:0.4289840457590362,0.006913:0.3764552842292379,0.03:0.14201700033942147,0.025:0.031492533562460345,0.01:0.0057428959027208,0.0625:0.005563615996288134,0.007050:0.004410121345864598,0.02:0.003315679992182833,0.06:0.0010680133665222197,0.057:2.6908116668663575E-4}
> > > >> 21541
> > > >>
> {0:0.6323790020346274,0.02:0.282038438551802,0.06:0.05929917592183956,0.046:0.021922159161082488,0.025:0.003123559949176154,0.1:6.163861804777446E-4,0.057:1.1513607281652563E-4,0.077:7.546641269535658E-5,0.05:6.020069105221075E-5,0.04:5.727546417602751E-5}
> > > >> 21542
> > > >>
> {0.06:0.7916202902993545,0.003:0.12474538653424426,0.03:0.0516187501990535,0.073:0.013078206873952192,0.077:0.006874558859394474,0.1:0.004726257215175674,0.07:0.0029954110394770084,0.02:0.0015535221634909381,0.0625:8.057802584933225E-4,0.006913:5.349608183182145E-4}
> > > >> 21543
> > > >>
> {0.03:0.8557703918728101,0.007050:0.0884696514495358,0.01:0.02386217742025169,0.055:0.01794750983209314,0.046:0.007244240662980594,0.02:0.0025021610305199344,0.1:0.0011795344808501283,0.0625:9.555718731181631E-4,0.077:8.384080940465597E-4,0.003:2.4480111447900804E-4}
> > > >> 21544
> > > >>
> {0.006913:0.6497351267772595,0.03:0.2586364130498913,0.003:0.0845308091084703,0.05:0.003488921831506818,0.0625:0.0011991059579690732,0.06:5.301659535652308E-4,0.055:3.152013670552699E-4,0.025:3.11949418681945E-4,0.02:2.2346509541410544E-4,0.057:1.727741085612565E-4}
> > > >> 21545
> > > >>
> {0.07:0.24937001680778675,0.03:0.22854680579732564,0.1:0.22068070811382484,0.05:0.21891358916881656,0.06:0.04971205462818302,0.046:0.029384231252419384,0.025:0.0014118858333346275,0.0625:0.0012321257282372393,0.073:3.1606402052550454E-4,0.04:1.1662985389192366E-4}
> > > >> 21546
> > > >>
> {0.006913:0.652435612545713,0.073:0.24282539253890825,0.003:0.055020663101050894,0.046:0.01978650831133704,0.04:0.011699600233998459,0.02:0.006822518565048025,0.05:0.0042733514779345234,0.025:0.0020768247329981497,0.03:0.0015466188524926666,0:0.0010433788491090747}
> > > >> 21547
> > > >>
> {0.1:0.935072008706917,0.077:0.040616382672055494,0.046:0.023138054104883225,0.06:3.42252302356326E-4,0.03:2.672247473681925E-4,0.007050:9.004419280457053E-5,0.006913:7.844271097106069E-5,0.04:7.198835322717653E-5,0.073:5.8017696474244885E-5,0.02:4.474316852571109E-5}
> > > >> 21548
> > > >>
> {0.006913:0.4510141547325999,0.03:0.21472666934984877,0.07:0.15060570743672352,0.046:0.08216181853028293,0.05:0.07498297963542139,0.077:0.01401215532342401,0.04:0.006135722806477439,0.073:0.0031124469556872442,0.02:0.001327252389650958,0.025:4.5167404646311124E-4}
> > > >> 21549
> > > >>
> {0.077:0.5249260290096315,0:0.27520186965742544,0.073:0.08959570186504386,0.057:0.05063809804337512,0.02:0.045313417578343,0.03:0.00856024297303885,0.06:0.0034039136814002697,0.07:8.611323331226122E-4,0.05:3.9387255234958607E-4,0.04:3.454752498776842E-4}
> > > >> 21550
> > > >>
> {0.077:0.8319708524327014,0.007050:0.16001430652201168,0.0625:0.0024884528530674226,0.03:0.0024228964094551416,0.006913:0.0010048446080994332,0.025:5.974607168723063E-4,0.06:2.7741593377792194E-4,0.057:2.2975316663437597E-4,0.073:2.1769059479546598E-4,0.01:1.4945234676270913E-4}
> > > >> 21551
> > > >>
> {0:0.43298549949426596,0.07:0.30407459462158964,0.0625:0.15744077685124136,0.01:0.06385977925647884,0.025:0.020205557109867888,0.04:0.013226123091680062,0.055:0.006501562964287696,0.073:5.871897881404074E-4,0.003:4.554836644848264E-4,0.05:2.0489216962221778E-4}
> > > >> 21552
> > > >>
> {0.06:0.7383744333942458,0.02:0.07321126682102753,0.07:0.05910378841288311,0.10:0.056907223730939045,0:0.02739559786902668,0.055:0.02228913751272657,0.1:0.00943274247398869,0.073:0.007301445750018608,0.03:0.0027711985062277246,0.046:0.0022569760697531112}
> > > >> 21553
> > > >>
> {0.006913:0.9383779606743132,0.07:0.018356074893823605,0.1:0.017572502072403694,0.025:0.011031848455352145,0.02:0.007110054905474798,0.10:0.0032390933761145377,0.01:0.0022676909091530165,0.06:6.35712654913703E-4,0.003:3.037608224814005E-4,0.0625:2.569989005512836E-4}
> > > >> 21554
> > > >>
> {0.02:0.4740260172915081,0.03:0.3264871353578654,0.003:0.12383071192450323,0:0.0387913487693229,0.046:0.013922363892934853,0.01:0.01115832101415319,0.055:0.004438529958216764,0.006913:0.0031112721201723413,0.025:0.002399946628810436,0.073:6.131643125523258E-4}
> > > >> 21555
> > > >>
> {0.01:0.42396888624053325,0.057:0.2876207419392007,0.055:0.13436747721404915,0.10:0.0923627978966975,0.05:0.030530372962547347,0.003:0.008404234144369286,0.006913:0.007605013189108045,0.007050:0.005108825532219537,0.0625:0.004651573366090424,0.025:0.002180181204608079}
> > > >> 21556
> > > >>
> {0.02:0.9634685201058687,0.077:0.02816791882994464,0.003:0.007368001305747207,0.01:4.6299664334060455E-4,0.055:6.358811991095171E-5,0.06:5.5240076581247115E-5,0.03:5.231350505975146E-5,0.073:4.4773553407989E-5,0:3.6810457882911134E-5,0.006913:3.469265861451538E-5}
> > > >> 21557
> > > >>
> {0.06:0.9993947051352264,0.05:2.53296407549323E-4,0.03:9.647254237540585E-5,0.04:4.3439270433017595E-5,0.006913:3.354681491698619E-5,0.046:2.8586727566416525E-5,0.007050:2.5584606074089293E-5,0.02:2.2477243875316502E-5,0:1.9194783598477564E-5,0.073:1.6900267481892075E-5}
> > > >> 21558
> > > >>
> {0.06:0.9985073788555696,0.03:8.033091100049725E-4,0.02:1.1953766739610202E-4,0.1:9.257610652171745E-5,0.006913:6.903783269939522E-5,0.04:5.7960967280609926E-5,0.07:5.5767050956214925E-5,0.046:4.857027337508731E-5,0.007050:3.5964741541364354E-5,0.057:3.538185773175377E-5}
> > > >> 21559
> > > >>
> {0.006913:0.5411158302162348,0.073:0.10595054605908563,0.04:0.10013413310674449,0.007050:0.08891690362990352,0.003:0.08028744789933502,0.03:0.07779866329563544,0.025:0.0024314950846438975,0.0625:0.0018111845757907532,0.077:5.019763638644379E-4,0.055:2.6705221435486376E-4}
> > > >> 21560
> > > >>
> {0.06:0.9978884964462115,0.03:9.139179655096763E-4,0.02:6.066525825847198E-4,0.04:3.2209695809936266E-4,0.006913:7.970816960964983E-5,0.007050:5.58984012266038E-5,0.046:1.545206468939752E-5,0.077:1.5319891034946036E-5,0:1.5223193391279898E-5,0.073:1.1687759018690983E-5}
> > > >> 21561
> > > >>
> {0.06:0.8704787441462888,0.007050:0.09131090501970876,0.0625:0.03778566273654969,0.046:8.111987924033587E-5,0.01:6.397000919080148E-5,0.1:4.979138875277178E-5,0.073:2.7942853854174412E-5,0.03:2.762357950066146E-5,0.077:2.6820742114533983E-5,0.025:2.6208487691114472E-5}
> > > >> 21562
> > > >>
> {0.06:0.9625603310705717,0.02:0.01708933716171641,0.007050:0.011374975845817934,0.07:0.00482114999912725,0.057:0.003310531318631415,0.077:2.568457462720504E-4,0.025:1.3439876933758153E-4,0.1:1.0512433283405881E-4,0:8.737443941062146E-5,0.046:5.9284232059258864E-5}
> > > >> 21563
> > > >>
> {0.06:0.9996809273982157,0.046:1.6325482120709976E-4,0.02:2.1084598024339765E-5,0.006913:1.7901242019979392E-5,0.04:1.3991162886383784E-5,0.03:1.3682157013015017E-5,0.077:1.3602492456590212E-5,0.007050:1.206712606141859E-5,0.1:1.0231842284246997E-5,0.057:7.704725064510759E-6}
> > > >> 21564
> > > >>
> {0.06:0.9997861963976675,0.03:2.7140899197995276E-5,0.077:2.4456158311423386E-5,0.04:1.7952818233846462E-5,0.02:1.6334455815684533E-5,0.006913:1.4910843270220926E-5,0.073:1.4893553937733922E-5,0.1:1.2545291899719683E-5,0.007050:1.2337617858874285E-5,0.07:1.121302251254011E-5}
> > > >> 21565
> > > >>
> {0.006913:0.5047832315246878,0.007050:0.2502505818382197,0.04:0.09937533960784072,0.03:0.05332716291468396,0.0625:0.035738976624857435,0.05:0.023139962103851885,0.1:0.01510786357969295,0.025:0.01294816540331917,0.06:0.003589347327961106,0.073:4.8607269023994543E-4}
> > > >> 21566
> > > >>
> {0.03:0.865579490292393,0.073:0.065013560785593,0.077:0.056622094108767465,0.046:0.006811842330071251,0.057:0.0021561477140846267,0.01:0.0013149375957061502,0.04:7.456782721333958E-4,0.05:6.412995789267404E-4,0:2.107021879325011E-4,0.007050:1.8121393989190674E-4}
> > > >> 21567
> > > >>
> {0.077:0.6528663315309344,0.03:0.2794089480653573,0.025:0.060214953606503134,0.003:0.0029851203917978303,0.01:0.002840784719750811,0.007050:5.508511345707982E-4,0.02:3.3395642786457786E-4,0.07:1.7452781529689483E-4,0.055:1.541344869853217E-4,0.046:1.0154945247629696E-4}
> > > >> 21568
> > > >>
> {0.057:0.7925855379581803,0.03:0.2036907633660934,0.06:0.002283924010657722,0.046:0.0011125177332923534,0.10:6.706442563331911E-5,0.02:3.9070227131596934E-5,0.07:3.266094677087569E-5,0.1:2.3436639268605713E-5,0.077:2.318067689954084E-5,0.006913:2.273173553155518E-5}
> > > >> 21569
> > > >>
> {0.06:0.7223525910216753,0.1:0.22280373045161775,0.04:0.04239924319412595,0.02:0.006529556700876843,0.007050:0.004661124794787862,0.10:2.017974034648702E-4,0.05:2.017516118028694E-4,0.025:1.583677755896652E-4,0.006913:1.1939663934259253E-4,0:8.736457986006156E-5}
> > > >> 21570
> > > >>
> {0.073:0.5321098304788365,0.006913:0.3897385574581158,0.02:0.051125703897749404,0.077:0.012417023254098358,0.01:0.006491518762040415,0.03:0.005434610908750246,0.055:9.352347087701305E-4,0.06:5.239363525283659E-4,0.10:4.4114444135088393E-4,0.003:2.2360302221231853E-4}
> > > >> 21571
> > > >>
> {0.06:0.9074255414695478,0.05:0.08971808812931319,0.02:0.0019124104766371694,0.1:2.581234320438502E-4,0.073:1.9041498001195312E-4,0.046:8.348355382104383E-5,0.006913:5.748062878632201E-5,0.057:4.8979196235823963E-5,0.04:4.4005272512340306E-5,0.03:4.1089625609562384E-5}
> > > >> 21572
> > > >>
> {0.06:0.7216888928389846,0.04:0.1955684645266304,0.006913:0.08235454447065854,0.1:9.82899034505965E-5,0.046:4.9184577303445956E-5,0.05:4.021965070167039E-5,0.007050:3.081280652634891E-5,0.073:2.791145099471127E-5,0.02:1.95681856192452E-5,0:1.9037190007395713E-5}
> > > >> 21573
> > > >>
> {0.05:0.8568971411565196,0.046:0.12909436895238377,0.06:0.012704234652048044,0.04:2.472784666357729E-4,0.055:2.1886698996361582E-4,0.1:1.1974451444757112E-4,0.0625:9.082357988309755E-5,0.07:9.030017229129562E-5,0.03:7.268045692763623E-5,0.073:6.607926928741721E-5}
> > > >> 21574
> > > >>
> {0.046:0.5619466458628039,0.006913:0.184782367089353,0.0625:0.09726566772972363,0.003:0.09534816862353344,0.02:0.015692927163565275,0.073:0.015492017672231727,0.01:0.01411218625979968,0.007050:0.01037341031640615,0.055:0.0020124341216292752,0:0.0013828685922332715}
> > > >> 21575
> > > >>
> {0.05:0.3167681189235041,0.06:0.2879333280436204,0.046:0.22584628506521745,0.003:0.15994241628395953,0.0625:0.006482994028630967,0.1:0.0016773104050919493,0.055:4.8725015996676173E-4,0.03:2.432893903382962E-4,0.025:1.6015914759364425E-4,0.006913:1.0021292427951807E-4}
> > > >> 21576
> > > >>
> {0.077:0.2407816981967022,0.003:0.18594668103110193,0.10:0.14951775492012523,0.0625:0.09241152906714677,0.007050:0.08997645163280943,0.057:0.07102865286733068,0.055:0.048046579920457584,0.05:0.03776387140040494,0.073:0.033355081179026046,0.006913:0.015980584385115525}
> > > >> 21577
> > > >>
> {0.06:0.9599074612361259,0.1:0.03694135499501186,0.04:0.0021941315048273186,0.057:4.7070027358666304E-4,0.02:1.8231606308803002E-4,0.0625:6.234942557920162E-5,0.006913:5.159011604129561E-5,0.046:3.9832812943124216E-5,0.007050:3.0380683715134534E-5,0.05:2.8920851352755496E-5}
> > > >>
> > > >>
> > > >> *Streaming k-means:*
> > > >>
> > > >> [snip]
> > > >> INFO: Number of Centroids: 0
> > > >> Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job
> run
> > > >> WARNING: job_local23982482_0001
> > > >> java.lang.IllegalArgumentException: Must have nonzero number of
> training
> > > >> and test vectors. Asked for %.1f %% of %d vectors for test
> > > >> [10.000000149011612, 0]
> > > >> at
> > > >>
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
> > > >> at
> > > >>
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
> > > >> at
> > > >>
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
> > > >> at
> > > >>
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
> > > >> at
> > > >>
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
> > > >> at
> > > >>
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
> > > >> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
> > > >> at
> > > >>
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
> > > >> at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
> > > >> at
> > > >>
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
> > > >>
> > > >> [snip]
> > > >>
> > > >> WARNING: No qualcluster.props found on classpath, will use
> command-line
> > > >> arguments only
> > > >> Num clusters: 0; maxDistance: 0.000000
> > > >> [Dunn Index] First: Infinity
> > > >> [Davies-Bouldin Index] First: NaN
> > > >> Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
> > > >> INFO: Program took 535 ms (Minutes: 0.008916666666666666)
> > > >> cluster,distance.mean,distance.sd
> > > >>
> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > > >>
> > > >>
> > > >> On Tue, Jan 21, 2014 at 1:47 PM, Andrew Musselman <
> > > >> andrew.musselman@gmail.com> wrote:
> > > >>
> > > >>> *classify-20newsgroups.sh*
> > > >>>
> > > >>> *Complementary naive bayes:*
> > > >>> =======================================================
> > > >>> Summary
> > > >>> -------------------------------------------------------
> > > >>> Correctly Classified Instances : 11207 98.9406%
> > > >>> Incorrectly Classified Instances : 120 1.0594%
> > > >>> Total Classified Instances : 11327
> > > >>>
> > > >>> =======================================================
> > > >>> Confusion Matrix
> > > >>> -------------------------------------------------------
> > > >>> a b c d e f g h i
> > > >>> j k l m n o p q r
> s
> > > >>> t <--Classified as
> > > >>> 475 0 0 1 0 0 0 0 0
> > > >>> 0 0 0 0 0 1 0 1 0
> 0
> > > >>> 0 | 478 a = alt.atheism
> > > >>> 0 597 1 1 0 1 1 0 0
> > > >>> 0 0 1 0 2 1 0 0 0
> 0
> > > >>> 0 | 605 b = comp.graphics
> > > >>> 0 1 620 3 0 1 0 0 0
> > > >>> 0 0 1 0 0 1 0 0 0
> 0
> > > >>> 0 | 627 c = comp.os.ms-windows.misc
> > > >>> 1 1 1 593 2 0 0 0 0
> > > >>> 0 0 0 0 0 0 1 0 0
> 0
> > > >>> 0 | 599 d = comp.sys.ibm.pc.hardware
> > > >>> 0 1 1 0 568 0 1 0 0
> > > >>> 0 1 1 2 0 0 0 0 1
> 0
> > > >>> 0 | 576 e = comp.sys.mac.hardware
> > > >>> 0 4 2 0 0 581 0 0 0
> > > >>> 0 0 0 0 0 0 0 0 0
> 0
> > > >>> 0 | 587 f = comp.windows.x
> > > >>> 0 0 0 1 2 0 571 3 0
> > > >>> 0 1 1 4 1 0 0 0 0
> 0
> > > >>> 0 | 584 g = misc.forsale
> > > >>> 0 0 0 1 0 0 0 589 1
> > > >>> 0 0 1 1 0 0 0 0 0
> 0
> > > >>> 0 | 593 h = rec.autos
> > > >>> 0 0 0 0 0 0 0 1 565
> > > >>> 0 0 0 0 0 1 0 0 0
> 0
> > > >>> 0 | 567 i = rec.motorcycles
> > > >>> 0 0 0 0 0 0 0 0 0
> > > >>> 600 2 0 0 0 1 0 0 0
> 0
> > > >>> 0 | 603 j = rec.sport.baseball
> > > >>> 0 0 0 0 0 0 0 0 0
> > > >>> 1 584 0 0 0 0 0 0 0
> 0
> > > >>> 0 | 585 k = rec.sport.hockey
> > > >>> 0 0 0 0 0 0 0 0 0
> > > >>> 0 0 579 0 0 0 0 0 1
> 0
> > > >>> 0 | 580 l = sci.crypt
> > > >>> 0 0 0 1 3 0 2 0 0
> > > >>> 2 0 0 567 1 2 1 0 0
> 0
> > > >>> 0 | 579 m = sci.electronics
> > > >>> 0 0 0 0 0 0 0 0 0
> > > >>> 0 0 0 1 605 0 0 0 0
> 0
> > > >>> 0 | 606 n = sci.med
> > > >>> 0 0 0 0 0 0 0 0 0
> > > >>> 0 0 0 0 0 602 0 0 0
> 0
> > > >>> 0 | 602 o = sci.space
> > > >>> 0 0 0 0 0 0 0 0 0
> > > >>> 0 0 0 0 1 0 602 0 0
> 1
> > > >>> 0 | 604 p = soc.religion.christian
> > > >>> 0 0 0 0 0 0 0 0 0
> > > >>> 0 0 0 0 0 0 0 556 0
> 0
> > > >>> 0 | 556 q = talk.politics.mideast
> > > >>> 0 0 1 0 0 0 0 0 0
> > > >>> 0 0 1 0 0 1 0 0
> 568 0
> > > >>> 0 | 571 r = talk.politics.guns
> > > >>> 11 0 0 0 0 0 0 0 0
> > > >>> 1 0 0 0 1 3 8 1 4
> 338
> > > >>> 2 | 369 s = talk.religion.misc
> > > >>> 0 0 0 0 0 0 0 0 0
> > > >>> 0 1 0 0 0 1 0 3 4
> 0
> > > >>> 447 | 456 t = talk.politics.misc
> > > >>>
> > > >>> =======================================================
> > > >>> Statistics
> > > >>> -------------------------------------------------------
> > > >>> Kappa 0.9806
> > > >>> Accuracy 98.9406%
> > > >>> Reliability 94.0932%
> > > >>> Reliability (standard deviation) 0.2163
> > > >>>
> > > >>> Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
> > > >>> INFO: Program took 15870 ms (Minutes: 0.2645)
> > > >>> + echo 'Testing on holdout set'
> > > >>> Testing on holdout set
> > > >>> + ./bin/mahout testnb -i
> /tmp/mahout-work-ec2-user/20news-test-vectors
> > > >>> -m /tmp/mahout-work-ec2-user/model -l
> /tmp/mahout-work-ec2-user/labelindex
> > > >>> -ow -o /tmp/mahout-work-ec2-user/20news-testing -c
> > > >>>
> > > >>> [snip]
> > > >>>
> > > >>> INFO: Complementary Results:
> > > >>> =======================================================
> > > >>> Summary
> > > >>> -------------------------------------------------------
> > > >>> Correctly Classified Instances : 6715 89.3071%
> > > >>> Incorrectly Classified Instances : 804 10.6929%
> > > >>> Total Classified Instances : 7519
> > > >>>
> > > >>> =======================================================
> > > >>> Confusion Matrix
> > > >>> -------------------------------------------------------
> > > >>> a b c d e f g h i
> > > >>> j k l m n o p q r
> s
> > > >>> t <--Classified as
> > > >>> 298 0 0 0 0 0 0 0 0
> > > >>> 1 0 0 0 1 2 5 1 0
> 13
> > > >>> 0 | 321 a = alt.atheism
> > > >>> 0 298 11 6 1 12 2 2 1
> > > >>> 1 3 8 3 4 2 4 1 4
> 4
> > > >>> 1 | 368 b = comp.graphics
> > > >>> 1 17 286 16 4 9 6 3 2
> > > >>> 0 1 0 1 7 1 0 2 1
> 0
> > > >>> 1 | 358 c = comp.os.ms-windows.misc
> > > >>> 2 6 11 309 9 5 14 8 1
> > > >>> 0 2 0 6 4 2 0 1 2
> 1
> > > >>> 0 | 383 d = comp.sys.ibm.pc.hardware
> > > >>> 0 10 8 7 334 7 5 5 2
> > > >>> 0 3 0 2 1 1 0 1 1
> 0
> > > >>> 0 | 387 e = comp.sys.mac.hardware
> > > >>> 1 13 7 8 2 355 2 0 2
> > > >>> 0 0 5 1 1 3 0 0 1
> 0
> > > >>> 0 | 401 f = comp.windows.x
> > > >>> 0 7 11 29 12 9 268 16 8
> > > >>> 4 3 2 6 4 2 1 3 1
> 2
> > > >>> 3 | 391 g = misc.forsale
> > > >>> 0 1 0 0 3 0 7 362 8
> > > >>> 2 2 1 2 0 2 0 1 2
> 0
> > > >>> 4 | 397 h = rec.autos
> > > >>> 0 0 0 1 0 0 1 0 423
> > > >>> 0 0 0 2 1 0 1 0 0
> 0
> > > >>> 0 | 429 i = rec.motorcycles
> > > >>> 0 0 1 0 0 0 0 2 2
> > > >>> 371 8 0 2 3 0 2 0 0
> 0
> > > >>> 0 | 391 j = rec.sport.baseball
> > > >>> 0 0 1 0 0 0 1 0 0
> > > >>> 2 409 0 0 0 0 0 0 0
> 0
> > > >>> 1 | 414 k = rec.sport.hockey
> > > >>> 0 0 1 2 1 0 1 0 0
> > > >>> 0 0 404 0 0 0 0 0 1
> 0
> > > >>> 1 | 411 l = sci.crypt
> > > >>> 0 5 4 11 1 3 7 9 2
> > > >>> 5 3 3 339 2 6 0 1 1
> 2
> > > >>> 1 | 405 m = sci.electronics
> > > >>> 0 4 0 1 0 0 0 1 0
> > > >>> 1 1 0 3 367 3 1 2 0
> 0
> > > >>> 0 | 384 n = sci.med
> > > >>> 0 1 2 0 1 0 2 0 0
> > > >>> 1 0 0 1 1 375 0 1 0
> 0
> > > >>> 0 | 385 o = sci.space
> > > >>> 4 2 1 1 0 0 1 1 2
> > > >>> 0 0 1 1 5 1 367 4 0
> 1
> > > >>> 1 | 393 p = soc.religion.christian
> > > >>> 0 1 0 0 0 0 0 0 0
> > > >>> 2 0 0 0 0 0 2 378 0
> 1
> > > >>> 0 | 384 q = talk.politics.mideast
> > > >>> 0 0 0 0 0 2 1 1 1
> > > >>> 1 0 3 0 3 0 0 2
> 319 2
> > > >>> 4 | 339 r = talk.politics.guns
> > > >>> 32 0 0 1 0 0 0 0 0
> > > >>> 1 1 1 0 2 2 26 5 7
> 175
> > > >>> 6 | 259 s = talk.religion.misc
> > > >>> 0 0 0 2 0 0 0 0 0
> > > >>> 1 2 2 0 1 2 1 10 18
> 2
> > > >>> 278 | 319 t = talk.politics.misc
> > > >>>
> > > >>> =======================================================
> > > >>> Statistics
> > > >>> -------------------------------------------------------
> > > >>> Kappa 0.8594
> > > >>> Accuracy 89.3071%
> > > >>> Reliability 84.611%
> > > >>> Reliability (standard deviation) 0.2148
> > > >>>
> > > >>> Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
> > > >>> INFO: Program took 10840 ms (Minutes: 0.18066666666666667)
> > > >>>
> > > >>>
> > > >>> *Naive bayes:*
> > > >>> INFO: Standard NB Results:
> > > >>> =======================================================
> > > >>> Summary
> > > >>> -------------------------------------------------------
> > > >>> Correctly Classified Instances : 11286 99.0869%
> > > >>> Incorrectly Classified Instances : 104 0.9131%
> > > >>> Total Classified Instances : 11390
> > > >>>
> > > >>> =======================================================
> > > >>> Confusion Matrix
> > > >>> -------------------------------------------------------
> > > >>> a b c d e f g h i
> > > >>> j k l m n o p q r
> s
> > > >>> t <--Classified as
> > > >>> 474 0 0 0 0 0 0 0 0
> > > >>> 0 0 0 0 0 0 0 0 0
> 2
> > > >>> 1 | 477 a = alt.atheism
> > > >>> 0 566 0 2 0 1 0 0 0
> > > >>> 0 0 0 0 0 0 0 0 0
> 0
> > > >>> 0 | 569 b = comp.graphics
> > > >>> 0 10 590 29 2 4 1 0 0
> > > >>> 0 0 0 1 0 0 0 0 0
> 0
> > > >>> 1 | 638 c = comp.os.ms-windows.misc
> > > >>> 0 0 0 596 0 0 0 0 0
> > > >>> 0 0 0 0 0 0 0 0 0
> 0
> > > >>> 0 | 596 d = comp.sys.ibm.pc.hardware
> > > >>> 0 0 0 0 575 0 1 0 0
> > > >>> 0 0 0 1 0 0 0 0 0
> 0
> > > >>> 0 | 577 e = comp.sys.mac.hardware
> > > >>> 0 2 2 2 0 593 1 0 0
> > > >>> 0 0 0 0 0 1 0 0 0
> 0
> > > >>> 0 | 601 f = comp.windows.x
> > > >>> 0 0 0 1 0 0 589 1 0
> > > >>> 0 1 0 2 0 0 0 0 0
> 0
> > > >>> 0 | 594 g = misc.forsale
> > > >>> 0 0 0 0 0 0 0 594 0
> > > >>> 0 0 0 0 0 0 0 0 0
> 0
> > > >>> 0 | 594 h = rec.autos
> > > >>> 0 0 0 0 0 0 0 0 611
> > > >>> 0 0 0 0 0 0 0 0 0
> 0
> > > >>> 0 | 611 i = rec.motorcycles
> > > >>> 0 0 0 0 0 0 0 0 0
> > > >>> 616 1 0 0 0 0 0 0 0
> 0
> > > >>> 0 | 617 j = rec.sport.baseball
> > > >>> 0 0 0 0 0 0 1 0 0
> > > >>> 0 620 0 0 0 0 0 0 0
> 0
> > > >>> 0 | 621 k = rec.sport.hockey
> > > >>> 0 0 0 0 0 0 0 0 0
> > > >>> 0 0 580 0 0 0 0 0 1
> 0
> > > >>> 0 | 581 l = sci.crypt
> > > >>> 0 0 0 3 1 0 0 0 0
> > > >>> 0 0 0 571 0 0 0 0 0
> 0
> > > >>> 0 | 575 m = sci.electronics
> > > >>> 0 0 0 0 0 0 0 0 0
> > > >>> 0 0 0 2 583 0 0 0 0
> 0
> > > >>> 0 | 585 n = sci.med
> > > >>> 0 0 0 0 0 0 0 0 0
> > > >>> 0 0 0 0 1 599 0 0 0
> 0
> > > >>> 0 | 600 o = sci.space
> > > >>> 0 1 0 0 0 0 0 0 0
> > > >>> 0 0 0 0 0 0 615 0 0
> 0
> > > >>> 0 | 616 p = soc.religion.christian
> > > >>> 1 0 0 0 0 0 0 0 0
> > > >>> 0 0 0 0 0 0 1 560 0
> 0
> > > >>> 0 | 562 q = talk.politics.mideast
> > > >>> 0 0 1 0 0 0 0 0 0
> > > >>> 0 0 1 0 0 0 0 0
> 548 0
> > > >>> 1 | 551 r = talk.politics.guns
> > > >>> 10 0 0 0 0 0 0 0 0
> > > >>> 0 0 0 0 0 1 1 0 2
> 344
> > > >>> 1 | 359 s = talk.religion.misc
> > > >>> 0 0 0 0 0 0 0 0 0
> > > >>> 0 0 1 1 0 0 0 0 2
> 0
> > > >>> 462 | 466 t = talk.politics.misc
> > > >>>
> > > >>> =======================================================
> > > >>> Statistics
> > > >>> -------------------------------------------------------
> > > >>> Kappa 0.9847
> > > >>> Accuracy 99.0869%
> > > >>> Reliability 94.3334%
> > > >>> Reliability (standard deviation) 0.2169
> > > >>>
> > > >>> Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
> > > >>> INFO: Program took 14304 ms (Minutes: 0.2384)
> > > >>> + echo 'Testing on holdout set'
> > > >>> Testing on holdout set
> > > >>>
> > > >>> [snip]
> > > >>>
> > > >>> INFO: Standard NB Results:
> > > >>> =======================================================
> > > >>> Summary
> > > >>> -------------------------------------------------------
> > > >>> Correctly Classified Instances : 6718 90.1019%
> > > >>> Incorrectly Classified Instances : 738 9.8981%
> > > >>> Total Classified Instances : 7456
> > > >>>
> > > >>> =======================================================
> > > >>> Confusion Matrix
> > > >>> -------------------------------------------------------
> > > >>> a b c d e f g h i
> > > >>> j k l m n o p q r
> s
> > > >>> t <--Classified as
> > > >>> 294 0 0 0 0 0 0 0 0
> > > >>> 0 0 2 0 1 1 6 1 1
> 16
> > > >>> 0 | 322 a = alt.atheism
> > > >>> 0 345 6 14 6 11 6 0 0
> > > >>> 0 0 5 7 1 3 0 0 0
> 0
> > > >>> 0 | 404 b = comp.graphics
> > > >>> 2 29 177 78 22 19 9 1 0
> > > >>> 0 0 4 2 0 1 1 0 0
> 1
> > > >>> 1 | 347 c = comp.os.ms-windows.misc
> > > >>> 1 9 2 335 18 2 10 0 0
> > > >>> 0 1 0 8 0 0 0 0 0
> 0
> > > >>> 0 | 386 d = comp.sys.ibm.pc.hardware
> > > >>> 1 4 2 13 347 3 5 1 0
> > > >>> 0 1 0 7 1 0 0 0 1
> 0
> > > >>> 0 | 386 e = comp.sys.mac.hardware
> > > >>> 0 20 0 4 0 352 4 0 0
> > > >>> 0 0 0 1 1 3 0 1 0
> 1
> > > >>> 0 | 387 f = comp.windows.x
> > > >>> 0 2 0 21 5 1 323 7 2
> > > >>> 2 0 2 12 0 3 0 0 0
> 0
> > > >>> 1 | 381 g = misc.forsale
> > > >>> 0 1 0 0 1 0 15 363 8
> > > >>> 1 0 0 4 1 0 0 0 1
> 0
> > > >>> 1 | 396 h = rec.autos
> > > >>> 0 1 0 0 0 0 6 6 370
> > > >>> 0 0 0 0 1 0 0 0 0
> 1
> > > >>> 0 | 385 i = rec.motorcycles
> > > >>> 1 0 0 1 1 0 2 1 2
> > > >>> 362 5 0 2 0 0 0 0 0
> 0
> > > >>> 0 | 377 j = rec.sport.baseball
> > > >>> 0 0 0 1 2 0 0 0 0
> > > >>> 3 371 0 0 0 0 0 0 0
> 0
> > > >>> 1 | 378 k = rec.sport.hockey
> > > >>> 0 3 1 0 1 0 2 0 0
> > > >>> 0 0 396 0 1 0 0 1 1
> 1
> > > >>> 3 | 410 l = sci.crypt
> > > >>> 0 7 0 7 7 2 6 4 0
> > > >>> 0 0 1 369 2 2 0 0 0
> 0
> > > >>> 2 | 409 m = sci.electronics
> > > >>> 0 3 0 2 1 0 2 0 0
> > > >>> 0 0 1 4 383 4 0 0 1
> 0
> > > >>> 4 | 405 n = sci.med
> > > >>> 0 5 0 0 1 0 3 0 0
> > > >>> 0 0 0 1 0 374 1 0 0
> 1
> > > >>> 1 | 387 o = sci.space
> > > >>> 6 2 0 1 1 0 0 1 0
> > > >>> 1 0 0 1 5 0 352 2 1
> 7
> > > >>> 1 | 381 p = soc.religion.christian
> > > >>> 1 1 0 0 0 0 0 0 0
> > > >>> 0 1 0 0 0 0 0 373 1
> 0
> > > >>> 1 | 378 q = talk.politics.mideast
> > > >>> 0 0 0 0 0 0 1 0 1
> > > >>> 0 0 2 0 0 0 0 0
> 346 2
> > > >>> 7 | 359 r = talk.politics.guns
> > > >>> 26 1 0 1 0 0 0 2 0
> > > >>> 1 1 0 0 1 1 20 2 6
> 200
> > > >>> 7 | 269 s = talk.religion.misc
> > > >>> 1 0 0 0 0 0 0 2 0
> > > >>> 0 1 0 0 2 2 0 1 14
> 0
> > > >>> 286 | 309 t = talk.politics.misc
> > > >>>
> > > >>> =======================================================
> > > >>> Statistics
> > > >>> -------------------------------------------------------
> > > >>> Kappa 0.8726
> > > >>> Accuracy 90.1019%
> > > >>> Reliability 85.4491%
> > > >>> Reliability (standard deviation) 0.2222
> > > >>>
> > > >>> Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
> > > >>> INFO: Program took 10878 ms (Minutes: 0.1813)
> > > >>>
> > > >>> *SGD:*
> > > >>> 7532 test files
> > > >>>
> > > >>> =======================================================
> > > >>> Summary
> > > >>> -------------------------------------------------------
> > > >>> Correctly Classified Instances : 5649 75%
> > > >>> Incorrectly Classified Instances : 1883 25%
> > > >>> Total Classified Instances : 7532
> > > >>>
> > > >>> =======================================================
> > > >>> Confusion Matrix
> > > >>> -------------------------------------------------------
> > > >>> a b c d e f g h i
> > > >>> j k l m n o p q r
> s
> > > >>> t <--Classified as
> > > >>> 186 6 3 10 5 0 33 4 13
> > > >>> 15 7 1 24 15 3 15 5 5
> 29
> > > >>> 15 | 394 a = sci.space
> > > >>> 5 309 0 3 2 5 0 0 0
> > > >>> 1 9 21 2 0 0 18 4 4
> 1
> > > >>> 1 | 385 b = comp.sys.mac.hardware
> > > >>> 4 1 101 3 0 1 63 0 7
> > > >>> 0 1 1 5 16 3 0 3 7
> 1
> > > >>> 34 | 251 c = talk.religion.misc
> > > >>> 11 12 1 265 1 10 3 0 0
> > > >>> 17 10 11 5 2 0 11 3 6
> 21
> > > >>> 0 | 389 d = comp.graphics
> > > >>> 2 1 1 0 349 2 3 0 3
> > > >>> 2 6 1 5 1 0 2 15 2
> 1
> > > >>> 2 | 398 e = rec.motorcycles
> > > >>> 7 20 3 19 2 254 6 0 2
> > > >>> 11 2 39 7 2 0 4 2 2
> 9
> > > >>> 3 | 394 f = comp.os.ms-windows.misc
> > > >>> 2 1 13 0 0 0 247 0 1
> > > >>> 1 3 0 6 2 4 0 2 3
> 5
> > > >>> 29 | 319 g = alt.atheism
> > > >>> 1 1 0 0 2 0 2 361 0
> > > >>> 1 2 0 2 0 0 1 3 22
> 0
> > > >>> 1 | 399 h = rec.sport.hockey
> > > >>> 3 0 3 1 0 0 5 0 161
> > > >>> 0 1 2 12 102 0 0 1 2
> 11
> > > >>> 6 | 310 i = talk.politics.misc
> > > >>> 2 8 0 19 0 19 0 0 1
> > > >>> 294 10 11 4 2 0 5 0 3
> 11
> > > >>> 6 | 395 j = comp.windows.x
> > > >>> 2 10 0 1 1 0 0 0 0
> > > >>> 1 347 13 2 1 0 5 3 2
> 2
> > > >>> 0 | 390 k = misc.forsale
> > > >>> 1 36 0 6 1 25 0 0 1
> > > >>> 6 10 257 2 1 0 34 6 0
> 6
> > > >>> 0 | 392 l = comp.sys.ibm.pc.hardware
> > > >>> 2 2 2 2 1 0 12 0 0
> > > >>> 6 10 4 312 5 2 13 11 3
> 3
> > > >>> 6 | 396 m = sci.med
> > > >>> 2 0 3 2 1 0 0 1 13
> > > >>> 0 5 1 2 314 2 0 2 2
> 10
> > > >>> 4 | 364 n = talk.politics.guns
> > > >>> 1 0 2 1 1 0 34 1 33
> > > >>> 1 3 0 1 8 271 1 4 5
> 6
> > > >>> 3 | 376 o = talk.politics.mideast
> > > >>> 3 14 0 8 2 8 3 1 1
> > > >>> 7 12 29 6 2 1 245 13 2
> 32
> > > >>> 4 | 393 p = sci.electronics
> > > >>> 3 3 0 2 11 0 1 0 2
> > > >>> 1 11 6 4 2 0 11 330 4
> 4
> > > >>> 1 | 396 q = rec.autos
> > > >>> 0 0 1 0 1 0 4 12 3
> > > >>> 1 3 0 0 0 0 5 6
> 359 1
> > > >>> 1 | 397 r = rec.sport.baseball
> > > >>> 0 1 0 0 0 1 0 0 3
> > > >>> 3 0 0 3 2 1 6 1 6
> 366
> > > >>> 3 | 396 s = sci.crypt
> > > >>> 0 2 11 1 1 0 40 0 1
> > > >>> 2 3 4 2 1 0 5 0 2
> 2
> > > >>> 321 | 398 t = soc.religion.christian
> > > >>>
> > > >>> =======================================================
> > > >>> Statistics
> > > >>> -------------------------------------------------------
> > > >>> Kappa 0.7073
> > > >>> Accuracy 75%
> > > >>> Reliability 70.6238%
> > > >>> Reliability (standard deviation) 0.2187
> > > >>> Log-likelihood mean : -1.1182
> > > >>> 25%-ile : -1.6911
> > > >>> 75%-ile : -0.0803
> > > >>>
> > > >>> Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
> > > >>> INFO: Program took 10783 ms (Minutes: 0.17971666666666666)
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <
> suneel_marthi@yahoo.com>wrote:
> > > >>>
> > > >>>> Thanks Andrew for reporting that. I rolled back the release to
> fix this
> > > >>>> and few other issues.
> > > >>>>
> > > >>>> We have removed asf-examples*.sh from trunk as the sample file at
> the
> > > >>>> url mentioned in ur email is not available.
> > > >>>> This is something we need to fix and restore in 1.0.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <
> > > >>>> ap.dev@outlook.com> wrote:
> > > >>>>
> > > >>>> from the asf-email-examples.sh script:
> > > >>>>
> > > >>>> # You will need to download or otherwise obtain some or all of the
> > > >>>> Amazon ASF Em
> > > >>>> ail Public Dataset (
> http://aws.amazon.com/datasets/7791434387204566)
> > > >>>> to use this
> > > >>>> script.
> > > >>>> # To obtain a full copy you will need to launch an EC2 instance
> and
> > > >>>> mount the da
> > > >>>> taset to download it, otherwise you can get a sample of it at
> > > >>>> #
> > > >>>>
> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
> > > >>>>
> > > >>>> It looks like the:
> > > >>>>
> > > >>>>
> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
> > > >>>>
> > > >>>> link is down.
> > > >>>>
> > > >>>> Is there somewhere else that we can get a subset of the ASF
> emails?
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Date: Tue, 21 Jan 2014 09:48:06 -0800
> > > >>>> > Subject: Re: MAHOUT 0.9 Release - New URL
> > > >>>> > From: andrew.musselman@gmail.com
> > > >>>> > To: dev@mahout.apache.org
> > > >>>> >
> > > >>>> > Sure thing; continuing to smoke test the other examples tonight
> > > >>>> >
> > > >>>> >
> > > >>>> > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <
> > > >>>> suneel_marthi@yahoo.com>wrote:
> > > >>>> >
> > > >>>> > > Thanks Andrew M., see that some of the example scripts need
> to be
> > > >>>> fixed as
> > > >>>> > > they still refer to the deprecated algorithms.
> > > >>>> > > See that the Streaming KMeans has failed for you as well.
> > > >>>> > >
> > > >>>> > > I'll be rolling back the release today to fix these issues.
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
> > > >>>> > > andrew.musselman@gmail.com> wrote:
> > > >>>> > >
> > > >>>> > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's
> default
> > > >>>> 64-bit
> > > >>>> > > Linux AMI from tarball.
> > > >>>> > >
> > > >>>> > > All tests pass.
> > > >>>> > >
> > > >>>> > > *Output of examples:*
> > > >>>> > > *asf-email-examples.sh, run on mahout.apache.org
> > > >>>> > > <http://mahout.apache.org>:*
> > > >>>> > > *recommendations:*
> > > >>>> > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
> > > >>>> > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000
> | less
> > > >>>> > > 1
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
> > > >>>> > > 4
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
> > > >>>> > > 6
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
> > > >>>> > > 8
> > > >>>> > > [12758:1.0,19409:1.0,11112:1.0]
> > > >>>> > > 11
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
> > > >>>> > > 14
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
> > > >>>> > > 15
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
> > > >>>> > > 16
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
> > > >>>> > > 18
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
> > > >>>> > > 19
> [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
> > > >>>> > > 20
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
> > > >>>> > > [snip]
> > > >>>> > >
> > > >>>> > > *clustering; kmeans:*
> > > >>>> > > [snip]
> > > >>>> > > Weight : [props - optional]: Point:
> > > >>>> > > 1.0 :
> > > >>>> > > [distance-squared=1.0193102046188427]:
> > > >>>> > >
> /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus=
> > > >>>> > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
> > > >>>> 7573:0.204,
> > > >>>> > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093,
> > > >>>> 9779:0.159,
> > > >>>> > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065,
> 17007:0.244,
> > > >>>> > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140,
> 24649:0.095,
> > > >>>> > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156,
> 31559:0.075,
> > > >>>> > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075,
> 38378:0.130,
> > > >>>> > > 39789:0.110, 40743:0.190, 45775:0.086]
> > > >>>> > > 1.0 : [distance-squared=0.9823018320457279]:
> > > >>>> > >
> /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus=
> > > >>>> > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
> > > >>>> 5336:0.106,
> > > >>>> > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173,
> > > >>>> 7832:0.072,
> > > >>>> > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094,
> 19359:0.177,
> > > >>>> > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098,
> 24649:0.092,
> > > >>>> > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150,
> 30459:0.072,
> > > >>>> > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113,
> 36491:0.073,
> > > >>>> > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106,
> 45775:0.083]
> > > >>>> > > 1.0 : [distance-squared=0.9509142993214911]:
> > > >>>> > >
> > > >>>>
> /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
> > > >>>> > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
> > > >>>> > > 4419:0.076,
> > > >>>> > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056,
> > > >>>> 7235:0.048,
> > > >>>> > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123,
> > > >>>> 7683:0.077,
> > > >>>> > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
> > > >>>> 10225:0.081,
> > > >>>> > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139,
> 11663:0.087,
> > > >>>> > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051,
> 14352:0.061,
> > > >>>> > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149,
> 19774:0.124,
> > > >>>> > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078,
> 23974:0.105,
> > > >>>> > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151,
> 25128:0.052,
> > > >>>> > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046,
> 31727:0.104,
> > > >>>> > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069,
> 33112:0.177,
> > > >>>> > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042,
> 35795:0.066,
> > > >>>> > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200,
> 37111:0.071,
> > > >>>> > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079,
> 41155:0.167,
> > > >>>> > > 41280:0.065, 41696:0.072, 41947:0.118,
> > > >>>> > > 43685:0.086, 44077:0.308,
> > > >>>> > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052,
> 46766:0.074,
> > > >>>> > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
> > > >>>> > > [snip]
> > > >>>> > >
> > > >>>> > > *clustering; dirichlet:*
> > > >>>> > > Get this complaint:
> > > >>>> > > Running Dirichlet with K = 8
> > > >>>> > > Running on hadoop, using
> /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > > >>>> > > HADOOP_CONF_DIR=
> > > >>>> > > MAHOUT-JOB:
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > > >>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add
> class:
> > > >>>> dirichlet
> > > >>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props
> > > >>>> found on
> > > >>>> > > classpath, will use command-line arguments only
> > > >>>> > > Unknown program 'dirichlet' chosen.
> > > >>>> > >
> > > >>>> > > *clustering: minhash:*
> > > >>>> > > Running Minhash
> > > >>>> > > Running on hadoop, using
> /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > > >>>> > > HADOOP_CONF_DIR=
> > > >>>> > > MAHOUT-JOB:
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > > >>>> > > 14/01/21 05:17:27 WARN
> > > >>>> > > driver.MahoutDriver: Unable to add class: minhash
> > > >>>> > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props
> found
> > > >>>> on
> > > >>>> > > classpath, will use command-line arguments only
> > > >>>> > > Unknown program 'minhash' chosen.
> > > >>>> > >
> > > >>>> > > *classification; standard:*
> > > >>>> > > =======================================================
> > > >>>> > > Summary
> > > >>>> > > -------------------------------------------------------
> > > >>>> > > Correctly Classified Instances : 5384
> 87.7874%
> > > >>>> > > Incorrectly Classified Instances : 749
> 12.2126%
> > > >>>> > > Total Classified Instances : 6133
> > > >>>> > >
> > > >>>> > > =======================================================
> > > >>>> > > Confusion Matrix
> > > >>>> > > -------------------------------------------------------
> > > >>>> > > a b c d
> > > >>>> > > <--Classified as
> > > >>>> > > 2949 7 531 25 | 3512 a = dev
> > > >>>> > > 0 0 0 0 | 0 b =
> general
> > > >>>> > > 99 8 1763 8 | 1878 c = user
> > > >>>> > > 41 1 29 672 | 743 d =
> commits
> > > >>>> > >
> > > >>>> > > =======================================================
> > > >>>> > > Statistics
> > > >>>> > > -------------------------------------------------------
> > > >>>> > > Kappa
> > > >>>> > > 0.7877
> > > >>>> > > Accuracy 87.7874%
> > > >>>> > > Reliability 53.658%
> > > >>>> > > Reliability (standard deviation) 0.4911
> > > >>>> > >
> > > >>>> > > *classification; complementary:*
> > > >>>> > > =======================================================
> > > >>>> > > Summary
> > > >>>> > > -------------------------------------------------------
> > > >>>> > > Correctly Classified Instances : 5530
> 90.1679%
> > > >>>> > > Incorrectly Classified Instances : 603
> 9.8321%
> > > >>>> > > Total Classified Instances :
> > > >>>> > > 6133
> > > >>>> > >
> > > >>>> > > =======================================================
> > > >>>> > > Confusion Matrix
> > > >>>> > > -------------------------------------------------------
> > > >>>> > > a b c d <--Classified as
> > > >>>> > > 3168 0 276 68 | 3512 a = dev
> > > >>>> > > 0 0 0 0 | 0 b =
> general
> > > >>>> > > 196 0 1652 30 | 1878 c = user
> > > >>>> > > 25 0 8 710 | 743 d =
> > > >>>> > > commits
> > > >>>> > >
> > > >>>> > > =======================================================
> > > >>>> > > Statistics
> > > >>>> > > -------------------------------------------------------
> > > >>>> > > Kappa 0.8259
> > > >>>> > > Accuracy 90.1679%
> > > >>>> > > Reliability 54.7459%
> > > >>>> > > Reliability (standard deviation) 0.5005
> > > >>>> > >
> > > >>>> > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took
> 20901 ms
> > > >>>> (Minutes:
> > > >>>> > > 0.34836666666666666)
> > > >>>> > >
> > > >>>> > > *classification; sgd, with three categories:*
> > > >>>> > > Running SGD Training
> > > >>>> > > Running on hadoop, using
> /home/ec2-user/hadoop-1.2.1/bin/hadoop
> > > >>>> > > and
> > > >>>> > > HADOOP_CONF_DIR=
> > > >>>> > > MAHOUT-JOB:
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > > >>>> > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
> > > >>>> > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
> > > >>>> classpath,
> > > >>>> > > will use command-line arguments only
> > > >>>> > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line
> arguments:
> > > >>>> > > {--cardinality=[100000], --categories=[3],
> --endPhase=[2147483647],
> > > >>>> > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
> > > >>>> > > --output=[asf-output/classification/sgd/models],
> --poolSize=[5],
> > > >>>> > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
> > > >>>> > > 24168 training files
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 1
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> > > >>>> > > 2
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 3
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 4
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 6
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 8
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 10
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00
> > > >>>> > > 0.00 0.00 0.0000000 0.0000000 12
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 15
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 20
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 25
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 30
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000
> > > >>>> > > 0.0000000 40
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 50
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 60
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 70
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 80
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 100
> > > >>>> > > 0.000
> > > >>>> > > 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 120
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 140
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 150
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 200
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 250
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00
> > > >>>> > > 0.00 0.00 0.0000000 0.0000000 300
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 400
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 500
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 600
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 700
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000
> > > >>>> > > 0.0000000 800
> > > >>>> > > 0.000 0.00 none
> > > >>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > > >>>> > > 1.0019413e-08 1000 -0.607 75.78 none
> > > >>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > > >>>> > > 1.0019413e-08 1200 -0.607 75.78 none
> > > >>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > > >>>> > > 1.0019413e-08 1400 -0.607 75.78 none
> > > >>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > > >>>> > > 1.0019413e-08 1500 -0.607 75.78 none
> > > >>>> > > 0.24 43686.00 17924.00 329.50
> > > >>>> > > 1.0571799e-08
> > > >>>> > > 1.0032261e-08 2000 -0.487 82.65 none
> > > >>>> > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
> > > >>>> > > 1.0011902e-08 2500 -0.439 83.90 none
> > > >>>> > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
> > > >>>> > > 1.0011902e-08 3000 -0.439 83.90 none
> > > >>>> > > 0.32 50635.00 28531.00 437.09 1.0551175e-08
> > > >>>> > > 1.0000001e-08 4000 -0.351 88.14 none
> > > >>>> > > 0.32 50635.00 32642.00 437.09 1.0551175e-08
> > > >>>> > > 1.0000000e-08 5000 -0.378 87.10 none
> > > >>>> > > 0.32 50635.00 36461.00 437.09
> > > >>>> > > 1.0556652e-08
> > > >>>> > > 1.0000001e-08 6000 -0.372 86.89 none
> > > >>>> > > 0.32 50635.00 37768.00 437.09 1.0576742e-08
> > > >>>> > > 1.0000001e-08 7000 -0.334 89.26 none
> > > >>>> > > 0.32 50635.00 38807.00 437.09 1.0576742e-08
> > > >>>> > > 1.0000000e-08 8000 -0.368 87.52 none
> > > >>>> > > 0.32 50635.00 44731.00 437.09 1.0576716e-08
> > > >>>> > > 1.0000000e-08 10000 -0.374 87.39 none
> > > >>>> > > 0.32 50635.00 45672.00 437.09 1.0576716e-08
> > > >>>> > > 1.0000000e-08 12000 -0.298 88.26 none
> > > >>>> > > Exception in thread "main" java.lang.IllegalStateException:
> > > >>>> > > java.lang.ArrayIndexOutOfBoundsException:
> > > >>>> > > 2
> > > >>>> > > at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
> > > >>>> > > at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
> > > >>>> > > at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
> > > >>>> > > at
> > > >>>> > >
> > > >>>>
> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
> > > >>>> > > at
> > > >>>> > >
> > > >>>>
> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
> > > >>>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > > >>>> Method)
> > > >>>> > > at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > > >>>> > >
> > > >>>> > > at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > >>>> > > at java.lang.reflect.Method.invoke(Method.java:622)
> > > >>>> > > at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > > >>>> > > at
> > > >>>> > >
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > > >>>> > > at
> > > >>>> > >
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> > > >>>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > > >>>> Method)
> > > >>>> > > at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > > >>>> > > at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > >>>> > > at java.lang.reflect.Method.invoke(Method.java:622)
> > > >>>> > > at
> > > >>>> > > org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> > > >>>> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
> > > >>>> > > at
> > > >>>> > >
> org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
> > > >>>> > > at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
> > > >>>> > > at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
> > > >>>> > > at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
> > > >>>> > > at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
> > > >>>> > > at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
> > > >>>> > >
> > > >>>> > > at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
> > > >>>> > > at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
> > > >>>> > > at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
> > > >>>> > > at
> > > >>>> > >
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > > >>>> > > at
> java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > >>>> > > at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > > >>>> > > at
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > >>>> > > at java.lang.Thread.run(Thread.java:701)
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > >
> > > >>>> > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
> > > >>>> > > andrew.musselman@gmail.com> wrote:
> > > >>>> > >
> > > >>>> > > > Trying out the build today
> > > >>>> > > >
> > > >>>> > > >
> > > >>>> > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
> > > >>>> suneel_marthi@yahoo.com
> > > >>>> > > >wrote:
> > > >>>> > > >
> > > >>>> > > >> This is an issue (trivial one though) that needs to be
> fixed for
> > > >>>> 0.9
> > > >>>> > > >> Release, will be rerolling the release today (in the next
> few
> > > >>>> hrs) and
> > > >>>> > > >> putting out a new release candidate in staging.
> > > >>>> > > >>
> > > >>>> > > >> Thanks for reporting this Andrew P.
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
> > > >>>> > > ap.dev@outlook.com>
> > > >>>> > > >> wrote:
> > > >>>> > > >>
> > > >>>> > > >> I ran through the tests with on a CentOS VM
> > > >>>> > > AMD64 2 cores 4 GB RAM. Had
> > > >>>> > > >> a bit of trouble getting the Hadoop natives to compile and
> > > >>>> therefore may
> > > >>>> > > >> have run into some problems because of the hadoop setup.
> Ran
> > > >>>> into some
> > > >>>> > > >> problems in the example scripts. Particularly with
> > > >>>> > > >> ./cluster-syntheticcontrol.sh ->4,5. I will run through
> the
> > > >>>> rest of the
> > > >>>> > > >> examples when im sure I've got hadoop setup right.
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >> Apache Maven 3.1.2-SNAPSHOT
> > > >>>> > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> > > >>>> > > >> Java home: /usr/java/jdk1.6.0_45/jre
> > > >>>> > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64",
> arch:
> > > >>>> "amd64",
> > > >>>> > > >> family: "unix"
> > > >>>> > > >> $MAHOUT_LOCAL=true
> > > >>>> > > >> Hadoop 2.2.0
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >> a) Verify that u can unpack the release (tar or zip)
> ...passed
> > > >>>> (tar)
> > > >>>> > > >> [passed ]
> > > >>>> > > >>
> > > >>>> > > >> b) Verify u r able to compile the
> > > >>>> > > distro
> > > >>>> > > >>
> > > >>>> > > >> mvn compile- [passed with warnings]
> > > >>>> > > >>
> > > >>>> > > >> [WARNING] Expected all dependencies to require Scala
> > > >>>> version: 2.9.3
> > > >>>> > > >> [WARNING] org.apache.mahout:mahout-math-scala:0.9
> requires
> > > >>>> scala
> > > >>>> > > >> version: 2.9.3
> > > >>>> > > >> [WARNING] org.scalatest:scalatest_2.9.2:1.9.1
> requires scala
> > > >>>> > > >> version: 2.9.2
> > > >>>> > > >> [WARNING] Multiple versions of scala libraries
> detected!
> > > >>>> > > >>
> > > >>>> > > >> c) Run through the unit tests: mvn clean test
> > > >>>> > > >> mvn clean test [passed]
> > > >>>> > > >>
> > > >>>> > > >> d) Run the
> > > >>>> > > >> example scripts under $MAHOUT_HOME/examples/bin.
> > > >>>> > > >> Please run through all the different options in each script
> > > >>>> > > >>
> > > >>>> > > >> Running example scripts with $MAHOUT_LOCAL=true
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > ./cluster-syntheticcontrol.sh ->1 [works]
> > > >>>> > > >> ./cluster-syntheticcontrol.sh ->2 [works]
> > > >>>> > > >> ./cluster-syntheticcontrol.sh ->3 [works]
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >> ./cluster-syntheticcontrol.sh ->4 [exits, throws
> exception]
> > > >>>> > > >> [...]
> > > >>>> > > >> WARNING: Unable to add class:
> > > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > > >>>> > > >> java.lang.ClassNotFoundException:
> > > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > > >>>> > > >> at
> > > >>>> > > >> java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > > >>>> > > >> at
> java.security.AccessController.doPrivileged(Native
> > > >>>> Method)
> > > >>>> > > >> at
> > > >>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > > >>>> > > >> at
> > > >>>> > > java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > > >>>> > > >> at
> > > >>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > > >>>> > > >> at
> java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > > >>>> > > >> at java.lang.Class.forName0(Native Method)
> > > >>>> > > >> at java.lang.Class.forName(Class.java:171)
> > > >>>> > > >> at
> > > >>>> > > >>
> > > >>>>
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > > >>>> > > >> at
> > > >>>> > > >>
> > > >>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > > >>>> > > >> Jan 19, 2014 7:55:31 PM
> org.slf4j.impl.JCLLoggerAdapter warn
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >> ./cluster-syntheticcontrol.sh ->5 [exits, throws
> exception]
> > > >>>> > > >>
> > > >>>> > > >> WARNING: Unable to add class:
> > > >>>> > > >>
> > > >>>> > > org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > > >>>> > > >> java.lang.ClassNotFoundException:
> > > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > > >>>> > > >> at
> java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > > >>>> > > >> at
> java.security.AccessController.doPrivileged(Native
> > > >>>> Method)
> > > >>>> > > >> at
> > > >>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > > >>>> > > >> at
> java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > > >>>> > > >> at
> > > >>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > > >>>> > > >> at
> java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > > >>>> > > >> at java.lang.Class.forName0(Native Method)
> > > >>>> > > >> at
> > > >>>> > > java.lang.Class.forName(Class.java:171)
> > > >>>> > > >> at
> > > >>>> > > >>
> > > >>>>
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > > >>>> > > >> at
> > > >>>> > > >>
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > > >>>> > > >> Jan 19, 2014 7:59:51 PM
> org.slf4j.impl.JCLLoggerAdapter warn
> > > >>>> > > >> WARNING: No
> > > >>>> > > >>
> > > >>>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props
> found
> > > >>>> > > on
> > > >>>> > > >> classpath, will use command-line arguments only
> > > >>>> > > >> Unknown program
> > > >>>> > > >>
> 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
> > > >>>> chosen.
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >> ./classify-20newsgroups.sh ->1 [works]
> > > >>>> > > >> ./classify-20newsgroups.sh ->2 [works]
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >> cluster-reuters.sh ->1 [works]
> > > >>>> > > >>
> > > >>>> > > cluster-reuters.sh ->2 [works]
> > > >>>> > > >> cluster-reuters.sh ->3 [works]
> > > >>>> > > >>
> > > >>>> > > >> Same error as noted previosly in the thread:
> > > >>>> > > >>
> > > >>>> > > >> cluster-reuters.sh ->4 [0 clusters]
> > > >>>> > > >>
> > > >>>> > > >> [...]
> > > >>>> > > >>
> > > >>>> > > >> WARNING: No qualcluster.props found on classpath, will
> use
> > > >>>> > > >> command-line arguments only
> > > >>>> > > >> Num clusters: 0; maxDistance: 0.000000
> > > >>>> > > >> [Dunn Index]
> > > >>>> > > >> First: Infinity
> > > >>>> > > >> [Davies-Bouldin Index] First: NaN
> > > >>>> > > >> Jan 19, 2014 7:13:57 PM
> org.slf4j.impl.JCLLoggerAdapter info
> > > >>>> > > >> INFO: Program took 669 ms (Minutes: 0.01115)
> > > >>>> > > >> cluster,distance.mean,distance.sd
> > > >>>> > > >>
> > > >>>> > >
> > > >>>> > >
> > > >>>>
> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
> > > >>>> > > >> > From: suneel_marthi@yahoo.com
> > > >>>> > > >> > Subject: MAHOUT 0.9 Release - New URL
> > > >>>> > > >> > To: user@mahout.apache.org; dev@mahout.apache.org
> > > >>>> > > >> >
> > > >>>> > > >> > Third time's a Charm!!!
> > > >>>> > > >> >
> > > >>>> > > >> >
> > > >>>> > > >> > Here's the new URL for Mahout 0.9 Release:
> > > >>>> > > >> >
> > > >>>> > > >>
> > > >>>> > >
> > > >>>>
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> > > >>>> > > >> >
> > > >>>> > > >> > For those volunteering to test this, some of the things
> to be
> > > >>>> > > verified:
> > > >>>> > > >> >
> > > >>>> > > >> > a) Verify that u can unpack the release (tar or zip)
> > > >>>> > > >> > b) Verify u r able to compile the distro
> > > >>>> > > >> > c) Run through the unit tests: mvn clean test
> > > >>>> > > >> > d) Run the example scripts
> > > >>>> > > >> under $MAHOUT_HOME/examples/bin. Please run through all
> the
> > > >>>> different
> > > >>>> > > >> options in each script.
> > > >>>> > > >> >
> > > >>>> > > >> >
> > > >>>> > > >> > Committers
> > > >>>> > > >> > and PMC members:
> > > >>>> > > >> > ---------------------------------------
> > > >>>> > > >> >
> > > >>>> > > >> > Need 'at least 3 +1 votes' for the Release to pass.
> > > >>>> > > >> >
> > > >>>> > > >> >
> > > >>>> > > >> > Thanks and
> > > >>>> > > Regards.
> > > >>>> > > >>
> > > >>>> > > >
> > > >>>> > > >
> > > >>>> > >
> > > >>>>
> > > >>>
> > > >>>
> > > >>
> > > >
> >
>
Re: MAHOUT 0.9 Release - New URL
Posted by Suneel Marthi <su...@yahoo.com>.
Thanks Andrew. I'll put a Release out soon.
On Wednesday, January 22, 2014 3:52 PM, Andrew Palumbo <ap...@outlook.com> wrote:
Everything seems to run well on my local machine:
Checked out revision 1560364.
CentOS 6
Apache Maven 3.1.2-SNAPSHOT
Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
Java home: /usr/java/jdk1.6.0_45/jre
OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64", family: "unix"
Hadoop 2.2.0
mvn clean compile -DSkipTests [OK-Several Warnings]
mvn clean test [PASSED ALL]
mvn clean install -DskipTests [OK]
$MAHOUT_LOCAL=true
classify-20newsgroups.sh->1 [Accuracy 89.3529%]
classify-20newsgroups.sh->2 [Accuracy 90.8317%]
classify-20newsgroups.sh->3 [Accuracy 76.2746%]
classify-20newsgroups.sh->4 [cleans up]
cluster-reuters.sh->1 [20 clusters] -kmeans
cluster-reuters.sh->2 [INFO: 20 clusters] -fkmeans
cluster-reuters.sh->3 [OK] -lda
cluster-reuters.sh->4 [10 (9) clusters- see attached] -streaming kmeans
./cluster-syntheticcontrol.sh->1 [INFO: Wrote 6 clusters]
./cluster-syntheticcontrol.sh->2 [INFO: Wrote 6 clusters]
./cluster-syntheticcontrol.sh->3 [INFO: Wrote 6 clusters]
./factorize-movielens-1M.sh /home/andy/test_data/ml-1m/ratings.dat [RMSE is: 0.851264570339848]
Attached is full output of cluster-reuters.sh->4 Streaming K-Means.
From cluster-reuters.sh->4 Streaming K-Means:
Cluster 0 is has 1 data point. Need atleast 2 data points in a cluster for OnlineSummarizer.
Average distance in cluster 1 [2816]: 3438.913758
Average distance in cluster 2 [112]: 20617.345993
Average distance in cluster 3 [4]: 32504.085379
Average distance in cluster 4 [435]: 18476.579935
Average distance in cluster 5 [27]: 21153.167574
Average distance in cluster 6 [15480]: 2040.864416
Average distance in cluster 7 [1711]: 5281.742482
Average distance in cluster 8 [964]: 15762.976239
Average distance in cluster 9 [28]: 19762.109632
Num clusters: 10; maxDistance: 107106.379648
[Dunn Index] First: 0.002272
[Davies-Bouldin Index] First: 57.871266
Jan 22, 2014 12:14:47 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 77695 ms (Minutes: 1.2949166666666667)
cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
1,3438.913758,2430.072640,250.635051,1793.254765,2908.356638,4444.702564,22173.892767,2816,train
2,20617.345993,3978.577827,-8306.835555,17787.685767,19584.319120,22864.637511,37305.829397,112,train
3,32504.085379,29250.558538,-12174.296092,12174.296092,36522.888276,26372.137172,107106.379648,4,train
4,18476.579935,3600.742072,-7212.729374,15841.995992,17431.838259,20066.610494,40205.090209,435,train
5,21153.167574,4963.661797,-8880.583978,19729.348269,21251.400944,24588.743549,27926.248558,27,train
6,2040.864416,2007.719699,53.622493,841.033934,1571.121917,2396.407672,18967.768820,15480,train
7,5281.742482,3083.071478,1933.759989,3216.929268,4074.689928,6371.577109,20292.193673,1711,train
8,15762.976239,3158.956443,65.031208,13511.867700,14744.029626,17287.006957,31483.809655,964,train
9,19762.109632,4355.120345,-8902.814641,18669.317253,20712.227220,21602.660490,27452.910312,28,train
> From: ap.dev@outlook.com
> To: dev@mahout.apache.org; user@mahout.apache.org
> Subject: RE: MAHOUT 0.9 Release - New URL
> Date: Wed, 22 Jan 2014 09:37:06 -0500
>
> will do!
>
> > Date: Wed, 22 Jan 2014 01:24:05 -0800
> > From: suneel_marthi@yahoo.com
> > Subject: Re: MAHOUT 0.9 Release - New URL
> > To: dev@mahout.apache.org; user@mahout.apache.org
> >
> > Andrew M., Andrew P. and others,
> >
> > Sebastian and me fixed a few issues today (for 0.9):
> >
> > a) Removed asf-email-examples.sh script and few other scripts that should have been removed. Also removed references/invocations to algorithms that have been removed from the codebase.
> > b) Fixed the issue with Streaming Kmeans clustering and checked in the code.
> > c) Resurrected Frequent Pattern Mining implementation for 0.9.
> >
> > Please checkout the latest code from trunk, run a build locally and run thru the example scripts.
> >
> > Thanks and Regards.
> >
> >
> >
> >
> >
> >
> > On Wednesday, January 22, 2014 12:11 AM, Andrew Musselman <an...@gmail.com> wrote:
> >
> > *factorize-movielens-1M.sh:*
> > RMSE is:
> >
> > 0.8519064098265133
> >
> >
> > Sample recommendations:
> >
> > 2229
> > [2197:4.4961276,527:4.4309845,2972:4.4180074,318:4.379484,572:4.312696,3092:4.249903]
> > 5848
> > [1900:4.6775646,3787:4.6623707,632:4.641377,2609:4.608225,3808:4.6058936,2998:4.6057487]
> > 3728
> > [572:4.951382,47:4.748921,874:4.6945343,1563:4.679901,3314:4.6621537,50:4.655838]
> > 1252
> > [53:5.0,3816:4.9664702,3077:4.9494777,213:4.94007,3808:4.9060082,978:4.8568053]
> > 634
> > [572:5.0,3092:4.779557,1872:4.72024,2687:4.629712,2125:4.615142,3853:4.5261393]
> > 5516 [572:5.0,2197:5.0,3092:5.0,318:4.908213,356:4.885,3844:4.8237453]
> > 2276 [1204:5.0,572:5.0,912:5.0,1250:5.0,1272:4.999891,1262:4.989652]
> > 4219
> > [53:4.8112006,598:4.775032,858:4.761604,572:4.7579737,1219:4.680987,1221:4.6604886]
> > 91 [1198:5.0,2762:5.0,1207:5.0,1234:5.0,318:5.0,260:5.0]
> > 502
> > [953:5.0,260:4.9800477,1234:4.869403,1198:4.8527064,1207:4.8497486,3469:4.847286]
> >
> > factorize-netflix.sh:
> > References a no-longer-available data set that Netflix took down after the
> > competition; should at least mention that the data set is no longer
> > "online" at least.
> >
> >
> > On Tue, Jan 21, 2014 at 8:05 PM, Andrew Musselman <
> > andrew.musselman@gmail.com> wrote:
> >
> > > *clustering-syntheticcontrol.sh*
> > >
> > > *Canopy:*
> > > [snip]
> > > 1.0 : [distance-squared=1740.681000315628]: [35.486, 25.600,
> > > 29.914, 30.200, 27.654, 28.647, 26.582, 32.941, 34.728, 29.047, 34.706,
> > > 34.764, 28.816, 30.271, 25.784, 25.035, 35.436, 29.075, 24.267, 24.625,
> > > 28.162, 26.218, 28.773, 28.966, 28.802, 34.292, 26.211, 33.363, 32.920,
> > > 31.891, 34.504, 32.686, 24.327, 35.981, 31.390, 10.832, 20.238, 10.051,
> > > 14.877, 10.570, 19.603, 14.544, 10.667, 16.470, 19.007, 10.352, 13.473,
> > > 12.196, 10.684, 16.620, 20.434, 17.069, 18.744, 9.599, 11.195, 12.002,
> > > 10.017, 17.149, 14.850, 10.890]
> > > 1.0 : [distance-squared=1455.363773097357]: [31.022, 28.140,
> > > 26.730, 26.570, 29.561, 26.966, 28.049, 25.673, 33.721, 26.275, 30.410,
> > > 31.101, 24.019, 35.659, 25.253, 25.932, 28.618, 32.423, 33.666, 33.745,
> > > 35.118, 29.164, 25.477, 31.947, 35.491, 30.730, 25.820, 24.651, 25.528,
> > > 31.343, 29.005, 31.825, 26.891, 28.194, 31.429, 16.935, 8.070, 16.604,
> > > 14.743, 10.342, 8.155, 10.395, 17.689, 16.791, 14.138, 15.761, 6.787,
> > > 13.062, 16.660, 15.021, 9.891, 9.216, 11.550, 8.877, 18.220, 9.477, 10.342,
> > > 16.430, 11.898, 15.366]
> > > 1.0 : [distance-squared=1679.9304895378882]: [29.625, 25.503,
> > > 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> > > 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> > > 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> > > 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> > > 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> > > 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> > > 15.285, 22.528, 20.657, 24.129]
> > > 1.0 : [distance-squared=2044.2887801683828]: [27.414, 25.397,
> > > 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> > > 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> > > 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> > > 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> > > 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> > > 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> > > 20.229, 11.131, 9.980, 10.720]
> > > 1.0 : [distance-squared=1385.3154063160764]: [35.899, 26.672,
> > > 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> > > 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> > > 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> > > 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> > > 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> > > 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> > > 16.546, 15.927, 18.084, 17.475]
> > > 1.0 : [distance-squared=1920.6376615603585]: [24.538, 24.280,
> > > 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> > > 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> > > 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> > > 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> > > 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> > > 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> > > 19.310, 12.999, 17.460]
> > > 1.0 : [distance-squared=2192.939571172661]: [34.335, 30.938,
> > > 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> > > 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> > > 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> > > 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> > > 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> > > 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> > > 11.743, 11.699, 10.152]
> > > Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Wrote 6 clusters
> > > Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Program took 5827 ms (Minutes: 0.09711666666666667)
> > >
> > > *K-means:*
> > > [snip]
> > > 1.0 : [distance-squared=2873.881301031739]: [26.369, 37.791,
> > > 41.839, 39.694, 36.728, 35.079, 30.668, 24.755, 20.610, 18.885, 15.459,
> > > 14.074, 19.117, 34.230, 32.202, 40.715, 39.543, 37.193, 39.448, 30.829,
> > > 25.273, 22.324, 19.801, 13.505, 17.462, 24.287, 27.398, 38.577, 42.108,
> > > 41.947, 43.987, 41.331, 28.958, 23.664, 20.308, 22.244, 12.149, 15.768,
> > > 18.964, 32.579, 33.202, 36.205, 42.364, 40.601, 35.415, 39.576, 33.145,
> > > 19.936, 23.062, 19.053, 24.383, 19.611, 25.218, 38.860, 36.570, 38.964,
> > > 36.946, 36.900, 32.593, 31.563]
> > > 1.0 : [distance-squared=2525.8924241648783]: [35.389, 31.178,
> > > 40.041, 43.034, 49.524, 40.942, 42.369, 30.153, 26.253, 23.178, 19.885,
> > > 19.693, 21.837, 26.858, 23.533, 29.798, 43.401, 43.349, 48.238, 43.868,
> > > 44.017, 28.056, 25.284, 22.737, 13.703, 14.023, 22.115, 23.720, 25.478,
> > > 37.870, 37.868, 46.954, 43.135, 41.286, 37.269, 29.347, 24.312, 21.743,
> > > 19.152, 21.668, 10.882, 16.147, 30.020, 28.472, 38.716, 44.620, 47.847,
> > > 48.846, 39.361, 38.449, 33.105, 19.935, 14.961, 12.149, 12.630, 13.459,
> > > 18.967, 35.473, 30.146, 45.527]
> > > 1.0 : [distance-squared=2392.7171990886272]: [27.662, 37.199,
> > > 39.158, 44.264, 46.473, 40.178, 38.728, 24.412, 25.155, 15.938, 13.125,
> > > 16.867, 13.875, 29.969, 34.322, 40.870, 44.225, 47.594, 47.607, 44.751,
> > > 39.813, 32.461, 16.646, 16.305, 21.256, 20.627, 19.267, 31.901, 34.995,
> > > 43.643, 43.152, 47.125, 48.524, 33.131, 32.697, 20.300, 14.350, 18.330,
> > > 12.155, 11.261, 19.701, 29.313, 30.457, 39.914, 49.543, 49.851, 46.445,
> > > 39.561, 28.860, 22.486, 13.967, 10.006, 11.715, 15.945, 21.348, 25.439,
> > > 41.024, 37.105, 45.623, 43.589]
> > > 1.0 : [distance-squared=1419.8378244373016]: [25.784, 34.129,
> > > 42.659, 37.176, 35.961, 34.307, 32.108, 29.749, 25.047, 17.455, 24.652,
> > > 25.311, 22.995, 30.256, 25.955, 28.426, 34.556, 40.386, 39.642, 40.566,
> > > 32.612, 34.091, 26.033, 25.668, 26.545, 17.338, 24.980, 20.134, 27.542,
> > > 36.612, 31.855, 37.947, 39.736, 33.535, 36.607, 37.479, 32.612, 22.262,
> > > 20.662, 16.124, 24.547, 27.686, 21.747, 27.198, 31.259, 40.569, 37.067,
> > > 34.465, 34.730, 33.371, 23.060, 30.162, 22.022, 22.216, 14.812, 19.357,
> > > 24.508, 34.432, 32.155, 34.839]
> > > 1.0 : [distance-squared=4186.814512311335]: [25.870, 39.195,
> > > 36.908, 47.052, 47.384, 40.741, 42.494, 30.282, 25.834, 17.650, 16.004,
> > > 17.895, 13.321, 19.045, 27.440, 31.911, 39.208, 43.622, 41.567, 44.815,
> > > 44.921, 35.422, 35.477, 23.190, 17.859, 14.684, 23.504, 23.141, 21.746,
> > > 30.816, 31.361, 37.015, 38.094, 46.688, 47.681, 43.777, 39.652, 31.701,
> > > 23.767, 22.265, 22.654, 22.327, 19.195, 21.163, 29.602, 27.563, 36.244,
> > > 38.859, 44.234, 42.352, 42.160, 40.172, 30.094, 21.092, 25.193, 13.096,
> > > 18.111, 14.754, 27.386, 27.026]
> > > 1.0 : [distance-squared=1544.4011543572997]: [28.075, 41.784,
> > > 42.120, 38.735, 44.320, 34.316, 32.212, 31.868, 24.301, 14.547, 17.178,
> > > 22.279, 24.357, 31.011, 31.444, 34.837, 46.550, 48.301, 38.859, 42.363,
> > > 35.657, 31.499, 19.794, 12.124, 15.371, 12.436, 15.763, 24.679, 32.597,
> > > 43.004, 36.616, 38.935, 42.954, 34.957, 36.183, 28.177, 16.326, 14.988,
> > > 10.680, 22.728, 24.075, 24.058, 36.616, 43.982, 39.198, 40.118, 40.078,
> > > 34.752, 34.018, 23.750, 18.374, 12.251, 15.539, 18.699, 28.973, 24.044,
> > > 39.404, 38.034, 46.458, 44.432]
> > > 1.0 : [distance-squared=825.9338725427806]: [33.670, 38.675,
> > > 39.742, 41.989, 37.291, 43.975, 31.909, 25.878, 31.080, 15.858, 13.950,
> > > 23.097, 19.983, 21.692, 31.579, 38.570, 33.376, 38.843, 41.936, 33.534,
> > > 39.195, 32.897, 25.343, 18.523, 15.089, 17.771, 22.614, 25.313, 23.687,
> > > 29.010, 41.995, 35.712, 40.872, 41.669, 32.156, 25.162, 24.980, 23.705,
> > > 18.413, 20.975, 14.906, 26.171, 30.165, 27.818, 35.083, 39.514, 37.851,
> > > 33.967, 32.338, 34.977, 26.589, 28.079, 19.597, 24.669, 23.098, 25.685,
> > > 28.215, 34.940, 36.910, 39.749]
> > > Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Wrote 6 clusters
> > > Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Program took 16902 ms (Minutes: 0.2817)
> > >
> > > *Fuzzy k-means:*
> > > [snip]
> > > 1.0 : [distance-squared=971.7369782121968]: [29.625, 25.503,
> > > 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> > > 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> > > 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> > > 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> > > 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> > > 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> > > 15.285, 22.528, 20.657, 24.129]
> > > 1.0 : [distance-squared=2054.618163154475]: [27.414, 25.397,
> > > 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> > > 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> > > 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> > > 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> > > 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> > > 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> > > 20.229, 11.131, 9.980, 10.720]
> > > 1.0 : [distance-squared=954.6503560728597]: [35.899, 26.672,
> > > 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> > > 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> > > 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> > > 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> > > 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> > > 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> > > 16.546, 15.927, 18.084, 17.475]
> > > 1.0 : [distance-squared=2817.9170498632957]: [24.538, 24.280,
> > > 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> > > 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> > > 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> > > 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> > > 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> > > 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> > > 19.310, 12.999, 17.460]
> > > 1.0 : [distance-squared=3472.3684696871424]: [34.335, 30.938,
> > > 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> > > 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> > > 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> > > 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> > > 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> > > 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> > > 11.743, 11.699, 10.152]
> > > Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Wrote 6 clusters
> > > Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Program took 106615 ms (Minutes: 1.7769166666666667)
> > >
> > > *Dirichlet and Meanshift:*
> > > Already detailed in M-1400, deprecated jobs still referenced.
> > >
> > >
> > >
> > > On Tue, Jan 21, 2014 at 6:20 PM, Andrew Musselman <
> > > andrew.musselman@gmail.com> wrote:
> > >
> > >> *cluster-reuters.sh*
> > >> *k-means:*
> > >>
> > >> [snip]
> > >> :VL-19482{n=913 c=[0.06:0.011, 0.1:0.007, 0.13:0.010, 0.25:0.016,
> > >> 0.38:0.020, 0.4:0.007, 0.5:0.032, 0
> > >> Top Terms:
> > >> banks =>
> > >> 3.841823268955143
> > >> bank =>
> > >> 3.80633066361209
> > >> debt =>
> > >> 3.28065219870794
> > >> said =>
> > >> 2.5965700942088583
> > >> he =>
> > >> 2.335682813857497
> > >> foreign =>
> > >> 2.2217853688201403
> > >> billion =>
> > >> 2.1970193848291335
> > >> would =>
> > >> 1.9932392063955617
> > >> loans =>
> > >> 1.9309276792854233
> > >> interest =>
> > >> 1.787324501938
> > >> have =>
> > >> 1.762981951432578
> > >> its =>
> > >> 1.7615109954971866
> > >> which =>
> > >> 1.5822081148036862
> > >> has =>
> > >> 1.5600708189041956
> > >> dlrs =>
> > >> 1.5571038313005996
> > >> finance =>
> > >> 1.5539758811252924
> > >> new =>
> > >> 1.5176015811577555
> > >> had =>
> > >> 1.5138723701401844
> > >> brazil =>
> > >> 1.5083369853593172
> > >> payments =>
> > >> 1.4539044255886517
> > >> Weight : [props - optional]: Point:
> > >>
> > >> :VL-7320{n=2726 c=[0:0.003, 0.1:0.010, 0.2:0.007, 0.3:0.009, 0.4:0.007,
> > >> 0.40:0.003, 0.5:0.009, 0.57:0
> > >> Top Terms:
> > >> vs =>
> > >> 6.126130791333171
> > >> net =>
> > >> 4.012191567277523
> > >> cts =>
> > >> 3.822006848832744
> > >> shr =>
> > >> 3.6786004856764527
> > >> mln =>
> > >> 2.9011643584038698
> > >> loss =>
> > >> 2.788368861463607
> > >> qtr =>
> > >> 2.714140225051522
> > >> revs =>
> > >> 2.4739861236454717
> > >> profit =>
> > >> 1.8146888090247015
> > >> note =>
> > >> 1.7977163272138388
> > >> dlrs =>
> > >> 1.6164390808155846
> > >> avg =>
> > >> 1.3901765773336587
> > >> shrs =>
> > >> 1.3856326531419314
> > >> mths =>
> > >> 1.3168717272038506
> > >> 4th =>
> > >> 1.2161158425617289
> > >> oper =>
> > >> 1.182419473776814
> > >> year =>
> > >> 1.178086061733047
> > >> nine =>
> > >> 1.0670554836445316
> > >> 3rd =>
> > >> 1.041334410056592
> > >> inc =>
> > >> 1.0019361981554935
> > >> Weight : [props - optional]: Point:
> > >>
> > >>
> > >> Inter-Cluster Density: 0.45562152681859414
> > >> Intra-Cluster Density: 0.6952712632167628
> > >> CDbw Inter-Cluster Density: 0.0
> > >> CDbw Intra-Cluster Density: 16.486930227598684
> > >> CDbw Separation: 194.49005884464628
> > >>
> > >> *fuzzy k-means:*
> > >> :SV-18539{n=1039 c=[0:0.026, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> > >> 0.01:0.005, 0.02:0.002, 0.0
> > >> Top Terms:
> > >> said =>
> > >> 1.8665592354713065
> > >> its =>
> > >> 1.1335212213411592
> > >> pct =>
> > >> 1.0862816801353348
> > >> dlrs =>
> > >> 1.0854998884993752
> > >> mln =>
> > >> 1.043163996400643
> > >> from =>
> > >> 0.9684961110525736
> > >> has =>
> > >> 0.912161511978058
> > >> company =>
> > >> 0.8754186972808333
> > >> mar =>
> > >> 0.8675333452422878
> > >> inc =>
> > >> 0.7678617590362815
> > >> would =>
> > >> 0.7610968883652675
> > >> he =>
> > >> 0.7459988770503974
> > >> which =>
> > >> 0.7435613119406804
> > >> year =>
> > >> 0.7302840632748394
> > >> u.s =>
> > >> 0.7281061062439116
> > >> shares =>
> > >> 0.7260764102983083
> > >> corp =>
> > >> 0.7179807367808658
> > >> new =>
> > >> 0.7044203783157115
> > >> stock =>
> > >> 0.6962010978721442
> > >> have =>
> > >> 0.6464265467298506
> > >> :SV-9431{n=1034 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> > >> 0.01:0.004, 0.02:0.002, 0.02
> > >> Top Terms:
> > >> said =>
> > >> 1.864911184196927
> > >> dlrs =>
> > >> 1.199286689822081
> > >> mln =>
> > >> 1.1802134783562215
> > >> pct =>
> > >> 1.1529704214798124
> > >> its =>
> > >> 1.1184398851519701
> > >> from =>
> > >> 1.016647848050332
> > >> company =>
> > >> 0.894703604722841
> > >> mar =>
> > >> 0.879986159541356
> > >> has =>
> > >> 0.8642799128491316
> > >> year =>
> > >> 0.8271823503717782
> > >> inc =>
> > >> 0.7871293745341424
> > >> corp =>
> > >> 0.737705498468879
> > >> which =>
> > >> 0.722975201852743
> > >> would =>
> > >> 0.708000816484415
> > >> u.s =>
> > >> 0.7073294276173905
> > >> billion =>
> > >> 0.7055723996916351
> > >> he =>
> > >> 0.7042684217823294
> > >> new =>
> > >> 0.6834737905434939
> > >> shares =>
> > >> 0.6753327384172428
> > >> stock =>
> > >> 0.6576225144041699
> > >> :SV-4785{n=1044 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> > >> 0.01:0.006, 0.02:0.002, 0.02
> > >> Top Terms:
> > >> said =>
> > >> 1.8796076179735086
> > >> its =>
> > >> 1.172025965452378
> > >> dlrs =>
> > >> 1.130422792460914
> > >> pct =>
> > >> 1.082038255241358
> > >> mln =>
> > >> 1.0772146872767114
> > >> company =>
> > >> 0.9662235879639138
> > >> from =>
> > >> 0.9473172871605616
> > >> has =>
> > >> 0.9224712965830099
> > >> mar =>
> > >> 0.8769325856924421
> > >> inc =>
> > >> 0.8360245257169788
> > >> shares =>
> > >> 0.8334595641384324
> > >> stock =>
> > >> 0.7704621839612175
> > >> corp =>
> > >> 0.7682400250301806
> > >> which =>
> > >> 0.7389988207856137
> > >> would =>
> > >> 0.7339708917389389
> > >> year =>
> > >> 0.7088414843731325
> > >> new =>
> > >> 0.7038109468655172
> > >> he =>
> > >> 0.6993994455501005
> > >> u.s =>
> > >> 0.6772649147622415
> > >> share =>
> > >> 0.6241804830055171
> > >>
> > >> *lda:*
> > >>
> > >> [snip]
> > >> 21539
> > >> {0.02:0.7071698119320297,0.01:0.09185661419250732,0.055:0.05559716236030052,0:0.04416624482186779,0.046:0.04110903741109505,0.10:0.036664417789754995,0.073:0.009543544335363714,0.006913:0.0050293341354450535,0.007050:0.004031353935388081,0.003:0.0019163134919350053}
> > >> 21540
> > >> {0.04:0.4289840457590362,0.006913:0.3764552842292379,0.03:0.14201700033942147,0.025:0.031492533562460345,0.01:0.0057428959027208,0.0625:0.005563615996288134,0.007050:0.004410121345864598,0.02:0.003315679992182833,0.06:0.0010680133665222197,0.057:2.6908116668663575E-4}
> > >> 21541
> > >> {0:0.6323790020346274,0.02:0.282038438551802,0.06:0.05929917592183956,0.046:0.021922159161082488,0.025:0.003123559949176154,0.1:6.163861804777446E-4,0.057:1.1513607281652563E-4,0.077:7.546641269535658E-5,0.05:6.020069105221075E-5,0.04:5.727546417602751E-5}
> > >> 21542
> > >> {0.06:0.7916202902993545,0.003:0.12474538653424426,0.03:0.0516187501990535,0.073:0.013078206873952192,0.077:0.006874558859394474,0.1:0.004726257215175674,0.07:0.0029954110394770084,0.02:0.0015535221634909381,0.0625:8.057802584933225E-4,0.006913:5.349608183182145E-4}
> > >> 21543
> > >> {0.03:0.8557703918728101,0.007050:0.0884696514495358,0.01:0.02386217742025169,0.055:0.01794750983209314,0.046:0.007244240662980594,0.02:0.0025021610305199344,0.1:0.0011795344808501283,0.0625:9.555718731181631E-4,0.077:8.384080940465597E-4,0.003:2.4480111447900804E-4}
> > >> 21544
> > >> {0.006913:0.6497351267772595,0.03:0.2586364130498913,0.003:0.0845308091084703,0.05:0.003488921831506818,0.0625:0.0011991059579690732,0.06:5.301659535652308E-4,0.055:3.152013670552699E-4,0.025:3.11949418681945E-4,0.02:2.2346509541410544E-4,0.057:1.727741085612565E-4}
> > >> 21545
> > >> {0.07:0.24937001680778675,0.03:0.22854680579732564,0.1:0.22068070811382484,0.05:0.21891358916881656,0.06:0.04971205462818302,0.046:0.029384231252419384,0.025:0.0014118858333346275,0.0625:0.0012321257282372393,0.073:3.1606402052550454E-4,0.04:1.1662985389192366E-4}
> > >> 21546
> > >> {0.006913:0.652435612545713,0.073:0.24282539253890825,0.003:0.055020663101050894,0.046:0.01978650831133704,0.04:0.011699600233998459,0.02:0.006822518565048025,0.05:0.0042733514779345234,0.025:0.0020768247329981497,0.03:0.0015466188524926666,0:0.0010433788491090747}
> > >> 21547
> > >> {0.1:0.935072008706917,0.077:0.040616382672055494,0.046:0.023138054104883225,0.06:3.42252302356326E-4,0.03:2.672247473681925E-4,0.007050:9.004419280457053E-5,0.006913:7.844271097106069E-5,0.04:7.198835322717653E-5,0.073:5.8017696474244885E-5,0.02:4.474316852571109E-5}
> > >> 21548
> > >> {0.006913:0.4510141547325999,0.03:0.21472666934984877,0.07:0.15060570743672352,0.046:0.08216181853028293,0.05:0.07498297963542139,0.077:0.01401215532342401,0.04:0.006135722806477439,0.073:0.0031124469556872442,0.02:0.001327252389650958,0.025:4.5167404646311124E-4}
> > >> 21549
> > >> {0.077:0.5249260290096315,0:0.27520186965742544,0.073:0.08959570186504386,0.057:0.05063809804337512,0.02:0.045313417578343,0.03:0.00856024297303885,0.06:0.0034039136814002697,0.07:8.611323331226122E-4,0.05:3.9387255234958607E-4,0.04:3.454752498776842E-4}
> > >> 21550
> > >> {0.077:0.8319708524327014,0.007050:0.16001430652201168,0.0625:0.0024884528530674226,0.03:0.0024228964094551416,0.006913:0.0010048446080994332,0.025:5.974607168723063E-4,0.06:2.7741593377792194E-4,0.057:2.2975316663437597E-4,0.073:2.1769059479546598E-4,0.01:1.4945234676270913E-4}
> > >> 21551
> > >> {0:0.43298549949426596,0.07:0.30407459462158964,0.0625:0.15744077685124136,0.01:0.06385977925647884,0.025:0.020205557109867888,0.04:0.013226123091680062,0.055:0.006501562964287696,0.073:5.871897881404074E-4,0.003:4.554836644848264E-4,0.05:2.0489216962221778E-4}
> > >> 21552
> > >> {0.06:0.7383744333942458,0.02:0.07321126682102753,0.07:0.05910378841288311,0.10:0.056907223730939045,0:0.02739559786902668,0.055:0.02228913751272657,0.1:0.00943274247398869,0.073:0.007301445750018608,0.03:0.0027711985062277246,0.046:0.0022569760697531112}
> > >> 21553
> > >> {0.006913:0.9383779606743132,0.07:0.018356074893823605,0.1:0.017572502072403694,0.025:0.011031848455352145,0.02:0.007110054905474798,0.10:0.0032390933761145377,0.01:0.0022676909091530165,0.06:6.35712654913703E-4,0.003:3.037608224814005E-4,0.0625:2.569989005512836E-4}
> > >> 21554
> > >> {0.02:0.4740260172915081,0.03:0.3264871353578654,0.003:0.12383071192450323,0:0.0387913487693229,0.046:0.013922363892934853,0.01:0.01115832101415319,0.055:0.004438529958216764,0.006913:0.0031112721201723413,0.025:0.002399946628810436,0.073:6.131643125523258E-4}
> > >> 21555
> > >> {0.01:0.42396888624053325,0.057:0.2876207419392007,0.055:0.13436747721404915,0.10:0.0923627978966975,0.05:0.030530372962547347,0.003:0.008404234144369286,0.006913:0.007605013189108045,0.007050:0.005108825532219537,0.0625:0.004651573366090424,0.025:0.002180181204608079}
> > >> 21556
> > >> {0.02:0.9634685201058687,0.077:0.02816791882994464,0.003:0.007368001305747207,0.01:4.6299664334060455E-4,0.055:6.358811991095171E-5,0.06:5.5240076581247115E-5,0.03:5.231350505975146E-5,0.073:4.4773553407989E-5,0:3.6810457882911134E-5,0.006913:3.469265861451538E-5}
> > >> 21557
> > >> {0.06:0.9993947051352264,0.05:2.53296407549323E-4,0.03:9.647254237540585E-5,0.04:4.3439270433017595E-5,0.006913:3.354681491698619E-5,0.046:2.8586727566416525E-5,0.007050:2.5584606074089293E-5,0.02:2.2477243875316502E-5,0:1.9194783598477564E-5,0.073:1.6900267481892075E-5}
> > >> 21558
> > >> {0.06:0.9985073788555696,0.03:8.033091100049725E-4,0.02:1.1953766739610202E-4,0.1:9.257610652171745E-5,0.006913:6.903783269939522E-5,0.04:5.7960967280609926E-5,0.07:5.5767050956214925E-5,0.046:4.857027337508731E-5,0.007050:3.5964741541364354E-5,0.057:3.538185773175377E-5}
> > >> 21559
> > >> {0.006913:0.5411158302162348,0.073:0.10595054605908563,0.04:0.10013413310674449,0.007050:0.08891690362990352,0.003:0.08028744789933502,0.03:0.07779866329563544,0.025:0.0024314950846438975,0.0625:0.0018111845757907532,0.077:5.019763638644379E-4,0.055:2.6705221435486376E-4}
> > >> 21560
> > >> {0.06:0.9978884964462115,0.03:9.139179655096763E-4,0.02:6.066525825847198E-4,0.04:3.2209695809936266E-4,0.006913:7.970816960964983E-5,0.007050:5.58984012266038E-5,0.046:1.545206468939752E-5,0.077:1.5319891034946036E-5,0:1.5223193391279898E-5,0.073:1.1687759018690983E-5}
> > >> 21561
> > >> {0.06:0.8704787441462888,0.007050:0.09131090501970876,0.0625:0.03778566273654969,0.046:8.111987924033587E-5,0.01:6.397000919080148E-5,0.1:4.979138875277178E-5,0.073:2.7942853854174412E-5,0.03:2.762357950066146E-5,0.077:2.6820742114533983E-5,0.025:2.6208487691114472E-5}
> > >> 21562
> > >> {0.06:0.9625603310705717,0.02:0.01708933716171641,0.007050:0.011374975845817934,0.07:0.00482114999912725,0.057:0.003310531318631415,0.077:2.568457462720504E-4,0.025:1.3439876933758153E-4,0.1:1.0512433283405881E-4,0:8.737443941062146E-5,0.046:5.9284232059258864E-5}
> > >> 21563
> > >> {0.06:0.9996809273982157,0.046:1.6325482120709976E-4,0.02:2.1084598024339765E-5,0.006913:1.7901242019979392E-5,0.04:1.3991162886383784E-5,0.03:1.3682157013015017E-5,0.077:1.3602492456590212E-5,0.007050:1.206712606141859E-5,0.1:1.0231842284246997E-5,0.057:7.704725064510759E-6}
> > >> 21564
> > >> {0.06:0.9997861963976675,0.03:2.7140899197995276E-5,0.077:2.4456158311423386E-5,0.04:1.7952818233846462E-5,0.02:1.6334455815684533E-5,0.006913:1.4910843270220926E-5,0.073:1.4893553937733922E-5,0.1:1.2545291899719683E-5,0.007050:1.2337617858874285E-5,0.07:1.121302251254011E-5}
> > >> 21565
> > >> {0.006913:0.5047832315246878,0.007050:0.2502505818382197,0.04:0.09937533960784072,0.03:0.05332716291468396,0.0625:0.035738976624857435,0.05:0.023139962103851885,0.1:0.01510786357969295,0.025:0.01294816540331917,0.06:0.003589347327961106,0.073:4.8607269023994543E-4}
> > >> 21566
> > >> {0.03:0.865579490292393,0.073:0.065013560785593,0.077:0.056622094108767465,0.046:0.006811842330071251,0.057:0.0021561477140846267,0.01:0.0013149375957061502,0.04:7.456782721333958E-4,0.05:6.412995789267404E-4,0:2.107021879325011E-4,0.007050:1.8121393989190674E-4}
> > >> 21567
> > >> {0.077:0.6528663315309344,0.03:0.2794089480653573,0.025:0.060214953606503134,0.003:0.0029851203917978303,0.01:0.002840784719750811,0.007050:5.508511345707982E-4,0.02:3.3395642786457786E-4,0.07:1.7452781529689483E-4,0.055:1.541344869853217E-4,0.046:1.0154945247629696E-4}
> > >> 21568
> > >> {0.057:0.7925855379581803,0.03:0.2036907633660934,0.06:0.002283924010657722,0.046:0.0011125177332923534,0.10:6.706442563331911E-5,0.02:3.9070227131596934E-5,0.07:3.266094677087569E-5,0.1:2.3436639268605713E-5,0.077:2.318067689954084E-5,0.006913:2.273173553155518E-5}
> > >> 21569
> > >> {0.06:0.7223525910216753,0.1:0.22280373045161775,0.04:0.04239924319412595,0.02:0.006529556700876843,0.007050:0.004661124794787862,0.10:2.017974034648702E-4,0.05:2.017516118028694E-4,0.025:1.583677755896652E-4,0.006913:1.1939663934259253E-4,0:8.736457986006156E-5}
> > >> 21570
> > >> {0.073:0.5321098304788365,0.006913:0.3897385574581158,0.02:0.051125703897749404,0.077:0.012417023254098358,0.01:0.006491518762040415,0.03:0.005434610908750246,0.055:9.352347087701305E-4,0.06:5.239363525283659E-4,0.10:4.4114444135088393E-4,0.003:2.2360302221231853E-4}
> > >> 21571
> > >> {0.06:0.9074255414695478,0.05:0.08971808812931319,0.02:0.0019124104766371694,0.1:2.581234320438502E-4,0.073:1.9041498001195312E-4,0.046:8.348355382104383E-5,0.006913:5.748062878632201E-5,0.057:4.8979196235823963E-5,0.04:4.4005272512340306E-5,0.03:4.1089625609562384E-5}
> > >> 21572
> > >> {0.06:0.7216888928389846,0.04:0.1955684645266304,0.006913:0.08235454447065854,0.1:9.82899034505965E-5,0.046:4.9184577303445956E-5,0.05:4.021965070167039E-5,0.007050:3.081280652634891E-5,0.073:2.791145099471127E-5,0.02:1.95681856192452E-5,0:1.9037190007395713E-5}
> > >> 21573
> > >> {0.05:0.8568971411565196,0.046:0.12909436895238377,0.06:0.012704234652048044,0.04:2.472784666357729E-4,0.055:2.1886698996361582E-4,0.1:1.1974451444757112E-4,0.0625:9.082357988309755E-5,0.07:9.030017229129562E-5,0.03:7.268045692763623E-5,0.073:6.607926928741721E-5}
> > >> 21574
> > >> {0.046:0.5619466458628039,0.006913:0.184782367089353,0.0625:0.09726566772972363,0.003:0.09534816862353344,0.02:0.015692927163565275,0.073:0.015492017672231727,0.01:0.01411218625979968,0.007050:0.01037341031640615,0.055:0.0020124341216292752,0:0.0013828685922332715}
> > >> 21575
> > >> {0.05:0.3167681189235041,0.06:0.2879333280436204,0.046:0.22584628506521745,0.003:0.15994241628395953,0.0625:0.006482994028630967,0.1:0.0016773104050919493,0.055:4.8725015996676173E-4,0.03:2.432893903382962E-4,0.025:1.6015914759364425E-4,0.006913:1.0021292427951807E-4}
> > >> 21576
> > >> {0.077:0.2407816981967022,0.003:0.18594668103110193,0.10:0.14951775492012523,0.0625:0.09241152906714677,0.007050:0.08997645163280943,0.057:0.07102865286733068,0.055:0.048046579920457584,0.05:0.03776387140040494,0.073:0.033355081179026046,0.006913:0.015980584385115525}
> > >> 21577
> > >> {0.06:0.9599074612361259,0.1:0.03694135499501186,0.04:0.0021941315048273186,0.057:4.7070027358666304E-4,0.02:1.8231606308803002E-4,0.0625:6.234942557920162E-5,0.006913:5.159011604129561E-5,0.046:3.9832812943124216E-5,0.007050:3.0380683715134534E-5,0.05:2.8920851352755496E-5}
> > >>
> > >>
> > >> *Streaming k-means:*
> > >>
> > >> [snip]
> > >> INFO: Number of Centroids: 0
> > >> Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
> > >> WARNING: job_local23982482_0001
> > >> java.lang.IllegalArgumentException: Must have nonzero number of training
> > >> and test vectors. Asked for %.1f %% of %d vectors for test
> > >> [10.000000149011612, 0]
> > >> at
> > >> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
> > >> at
> > >> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
> > >> at
> > >> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
> > >> at
> > >> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
> > >> at
> > >> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
> > >> at
> > >> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
> > >> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
> > >> at
> > >> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
> > >> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
> > >> at
> > >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
> > >>
> > >> [snip]
> > >>
> > >> WARNING: No qualcluster.props found on classpath, will use command-line
> > >> arguments only
> > >> Num clusters: 0; maxDistance: 0.000000
> > >> [Dunn Index] First: Infinity
> > >> [Davies-Bouldin Index] First: NaN
> > >> Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
> > >> INFO: Program took 535 ms (Minutes: 0.008916666666666666)
> > >> cluster,distance.mean,distance.sd
> > >> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > >>
> > >>
> > >> On Tue, Jan 21, 2014 at 1:47 PM, Andrew Musselman <
> > >> andrew.musselman@gmail.com> wrote:
> > >>
> > >>> *classify-20newsgroups.sh*
> > >>>
> > >>> *Complementary naive bayes:*
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances : 11207 98.9406%
> > >>> Incorrectly Classified Instances : 120 1.0594%
> > >>> Total Classified Instances : 11327
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a b c d e f g h i
> > >>> j k l m n o p q r s
> > >>> t <--Classified as
> > >>> 475 0 0 1 0 0 0 0 0
> > >>> 0 0 0 0 0 1 0 1 0 0
> > >>> 0 | 478 a = alt.atheism
> > >>> 0 597 1 1 0 1 1 0 0
> > >>> 0 0 1 0 2 1 0 0 0 0
> > >>> 0 | 605 b = comp.graphics
> > >>> 0 1 620 3 0 1 0 0 0
> > >>> 0 0 1 0 0 1 0 0 0 0
> > >>> 0 | 627 c = comp.os.ms-windows.misc
> > >>> 1 1 1 593 2 0 0 0 0
> > >>> 0 0 0 0 0 0 1 0 0 0
> > >>> 0 | 599 d = comp.sys.ibm.pc.hardware
> > >>> 0 1 1 0 568 0 1 0 0
> > >>> 0 1 1 2 0 0 0 0 1 0
> > >>> 0 | 576 e = comp.sys.mac.hardware
> > >>> 0 4 2 0 0 581 0 0 0
> > >>> 0 0 0 0 0 0 0 0 0 0
> > >>> 0 | 587 f = comp.windows.x
> > >>> 0 0 0 1 2 0 571 3 0
> > >>> 0 1 1 4 1 0 0 0 0 0
> > >>> 0 | 584 g = misc.forsale
> > >>> 0 0 0 1 0 0 0 589 1
> > >>> 0 0 1 1 0 0 0 0 0 0
> > >>> 0 | 593 h = rec.autos
> > >>> 0 0 0 0 0 0 0 1 565
> > >>> 0 0 0 0 0 1 0 0 0 0
> > >>> 0 | 567 i = rec.motorcycles
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 600 2 0 0 0 1 0 0 0 0
> > >>> 0 | 603 j = rec.sport.baseball
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 1 584 0 0 0 0 0 0 0 0
> > >>> 0 | 585 k = rec.sport.hockey
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 579 0 0 0 0 0 1 0
> > >>> 0 | 580 l = sci.crypt
> > >>> 0 0 0 1 3 0 2 0 0
> > >>> 2 0 0 567 1 2 1 0 0 0
> > >>> 0 | 579 m = sci.electronics
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 0 1 605 0 0 0 0 0
> > >>> 0 | 606 n = sci.med
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 0 0 0 602 0 0 0 0
> > >>> 0 | 602 o = sci.space
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 0 0 1 0 602 0 0 1
> > >>> 0 | 604 p = soc.religion.christian
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 0 0 0 0 0 556 0 0
> > >>> 0 | 556 q = talk.politics.mideast
> > >>> 0 0 1 0 0 0 0 0 0
> > >>> 0 0 1 0 0 1 0 0 568 0
> > >>> 0 | 571 r = talk.politics.guns
> > >>> 11 0 0 0 0 0 0 0 0
> > >>> 1 0 0 0 1 3 8 1 4 338
> > >>> 2 | 369 s = talk.religion.misc
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 1 0 0 0 1 0 3 4 0
> > >>> 447 | 456 t = talk.politics.misc
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa 0.9806
> > >>> Accuracy 98.9406%
> > >>> Reliability 94.0932%
> > >>> Reliability (standard deviation) 0.2163
> > >>>
> > >>> Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 15870 ms (Minutes: 0.2645)
> > >>> + echo 'Testing on holdout set'
> > >>> Testing on holdout set
> > >>> + ./bin/mahout testnb -i /tmp/mahout-work-ec2-user/20news-test-vectors
> > >>> -m /tmp/mahout-work-ec2-user/model -l /tmp/mahout-work-ec2-user/labelindex
> > >>> -ow -o /tmp/mahout-work-ec2-user/20news-testing -c
> > >>>
> > >>> [snip]
> > >>>
> > >>> INFO: Complementary Results:
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances : 6715 89.3071%
> > >>> Incorrectly Classified Instances : 804 10.6929%
> > >>> Total Classified Instances : 7519
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a b c d e f g h i
> > >>> j k l m n o p q r s
> > >>> t <--Classified as
> > >>> 298 0 0 0 0 0 0 0 0
> > >>> 1 0 0 0 1 2 5 1 0 13
> > >>> 0 | 321 a = alt.atheism
> > >>> 0 298 11 6 1 12 2 2 1
> > >>> 1 3 8 3 4 2 4 1 4 4
> > >>> 1 | 368 b = comp.graphics
> > >>> 1 17 286 16 4 9 6 3 2
> > >>> 0 1 0 1 7 1 0 2 1 0
> > >>> 1 | 358 c = comp.os.ms-windows.misc
> > >>> 2 6 11 309 9 5 14 8 1
> > >>> 0 2 0 6 4 2 0 1 2 1
> > >>> 0 | 383 d = comp.sys.ibm.pc.hardware
> > >>> 0 10 8 7 334 7 5 5 2
> > >>> 0 3 0 2 1 1 0 1 1 0
> > >>> 0 | 387 e = comp.sys.mac.hardware
> > >>> 1 13 7 8 2 355 2 0 2
> > >>> 0 0 5 1 1 3 0 0 1 0
> > >>> 0 | 401 f = comp.windows.x
> > >>> 0 7 11 29 12 9 268 16 8
> > >>> 4 3 2 6 4 2 1 3 1 2
> > >>> 3 | 391 g = misc.forsale
> > >>> 0 1 0 0 3 0 7 362 8
> > >>> 2 2 1 2 0 2 0 1 2 0
> > >>> 4 | 397 h = rec.autos
> > >>> 0 0 0 1 0 0 1 0 423
> > >>> 0 0 0 2 1 0 1 0 0 0
> > >>> 0 | 429 i = rec.motorcycles
> > >>> 0 0 1 0 0 0 0 2 2
> > >>> 371 8 0 2 3 0 2 0 0 0
> > >>> 0 | 391 j = rec.sport.baseball
> > >>> 0 0 1 0 0 0 1 0 0
> > >>> 2 409 0 0 0 0 0 0 0 0
> > >>> 1 | 414 k = rec.sport.hockey
> > >>> 0 0 1 2 1 0 1 0 0
> > >>> 0 0 404 0 0 0 0 0 1 0
> > >>> 1 | 411 l = sci.crypt
> > >>> 0 5 4 11 1 3 7 9 2
> > >>> 5 3 3 339 2 6 0 1 1 2
> > >>> 1 | 405 m = sci.electronics
> > >>> 0 4 0 1 0 0 0 1 0
> > >>> 1 1 0 3 367 3 1 2 0 0
> > >>> 0 | 384 n = sci.med
> > >>> 0 1 2 0 1 0 2 0 0
> > >>> 1 0 0 1 1 375 0 1 0 0
> > >>> 0 | 385 o = sci.space
> > >>> 4 2 1 1 0 0 1 1 2
> > >>> 0 0 1 1 5 1 367 4 0 1
> > >>> 1 | 393 p = soc.religion.christian
> > >>> 0 1 0 0 0 0 0 0 0
> > >>> 2 0 0 0 0 0 2 378 0 1
> > >>> 0 | 384 q = talk.politics.mideast
> > >>> 0 0 0 0 0 2 1 1 1
> > >>> 1 0 3 0 3 0 0 2 319 2
> > >>> 4 | 339 r = talk.politics.guns
> > >>> 32 0 0 1 0 0 0 0 0
> > >>> 1 1 1 0 2 2 26 5 7 175
> > >>> 6 | 259 s = talk.religion.misc
> > >>> 0 0 0 2 0 0 0 0 0
> > >>> 1 2 2 0 1 2 1 10 18 2
> > >>> 278 | 319 t = talk.politics.misc
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa 0.8594
> > >>> Accuracy 89.3071%
> > >>> Reliability 84.611%
> > >>> Reliability (standard deviation) 0.2148
> > >>>
> > >>> Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 10840 ms (Minutes: 0.18066666666666667)
> > >>>
> > >>>
> > >>> *Naive bayes:*
> > >>> INFO: Standard NB Results:
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances : 11286 99.0869%
> > >>> Incorrectly Classified Instances : 104 0.9131%
> > >>> Total Classified Instances : 11390
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a b c d e f g h i
> > >>> j k l m n o p q r s
> > >>> t <--Classified as
> > >>> 474 0 0 0 0 0 0 0 0
> > >>> 0 0 0 0 0 0 0 0 0 2
> > >>> 1 | 477 a = alt.atheism
> > >>> 0 566 0 2 0 1 0 0 0
> > >>> 0 0 0 0 0 0 0 0 0 0
> > >>> 0 | 569 b = comp.graphics
> > >>> 0 10 590 29 2 4 1 0 0
> > >>> 0 0 0 1 0 0 0 0 0 0
> > >>> 1 | 638 c = comp.os.ms-windows.misc
> > >>> 0 0 0 596 0 0 0 0 0
> > >>> 0 0 0 0 0 0 0 0 0 0
> > >>> 0 | 596 d = comp.sys.ibm.pc.hardware
> > >>> 0 0 0 0 575 0 1 0 0
> > >>> 0 0 0 1 0 0 0 0 0 0
> > >>> 0 | 577 e = comp.sys.mac.hardware
> > >>> 0 2 2 2 0 593 1 0 0
> > >>> 0 0 0 0 0 1 0 0 0 0
> > >>> 0 | 601 f = comp.windows.x
> > >>> 0 0 0 1 0 0 589 1 0
> > >>> 0 1 0 2 0 0 0 0 0 0
> > >>> 0 | 594 g = misc.forsale
> > >>> 0 0 0 0 0 0 0 594 0
> > >>> 0 0 0 0 0 0 0 0 0 0
> > >>> 0 | 594 h = rec.autos
> > >>> 0 0 0 0 0 0 0 0 611
> > >>> 0 0 0 0 0 0 0 0 0 0
> > >>> 0 | 611 i = rec.motorcycles
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 616 1 0 0 0 0 0 0 0 0
> > >>> 0 | 617 j = rec.sport.baseball
> > >>> 0 0 0 0 0 0 1 0 0
> > >>> 0 620 0 0 0 0 0 0 0 0
> > >>> 0 | 621 k = rec.sport.hockey
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 580 0 0 0 0 0 1 0
> > >>> 0 | 581 l = sci.crypt
> > >>> 0 0 0 3 1 0 0 0 0
> > >>> 0 0 0 571 0 0 0 0 0 0
> > >>> 0 | 575 m = sci.electronics
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 0 2 583 0 0 0 0 0
> > >>> 0 | 585 n = sci.med
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 0 0 1 599 0 0 0 0
> > >>> 0 | 600 o = sci.space
> > >>> 0 1 0 0 0 0 0 0 0
> > >>> 0 0 0 0 0 0 615 0 0 0
> > >>> 0 | 616 p = soc.religion.christian
> > >>> 1 0 0 0 0 0 0 0 0
> > >>> 0 0 0 0 0 0 1 560 0 0
> > >>> 0 | 562 q = talk.politics.mideast
> > >>> 0 0 1 0 0 0 0 0 0
> > >>> 0 0 1 0 0 0 0 0 548 0
> > >>> 1 | 551 r = talk.politics.guns
> > >>> 10 0 0 0 0 0 0 0 0
> > >>> 0 0 0 0 0 1 1 0 2 344
> > >>> 1 | 359 s = talk.religion.misc
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 1 1 0 0 0 0 2 0
> > >>> 462 | 466 t = talk.politics.misc
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa 0.9847
> > >>> Accuracy 99.0869%
> > >>> Reliability 94.3334%
> > >>> Reliability (standard deviation) 0.2169
> > >>>
> > >>> Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 14304 ms (Minutes: 0.2384)
> > >>> + echo 'Testing on holdout set'
> > >>> Testing on holdout set
> > >>>
> > >>> [snip]
> > >>>
> > >>> INFO: Standard NB Results:
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances : 6718 90.1019%
> > >>> Incorrectly Classified Instances : 738 9.8981%
> > >>> Total Classified Instances : 7456
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a b c d e f g h i
> > >>> j k l m n o p q r s
> > >>> t <--Classified as
> > >>> 294 0 0 0 0 0 0 0 0
> > >>> 0 0 2 0 1 1 6 1 1 16
> > >>> 0 | 322 a = alt.atheism
> > >>> 0 345 6 14 6 11 6 0 0
> > >>> 0 0 5 7 1 3 0 0 0 0
> > >>> 0 | 404 b = comp.graphics
> > >>> 2 29 177 78 22 19 9 1 0
> > >>> 0 0 4 2 0 1 1 0 0 1
> > >>> 1 | 347 c = comp.os.ms-windows.misc
> > >>> 1 9 2 335 18 2 10 0 0
> > >>> 0 1 0 8 0 0 0 0 0 0
> > >>> 0 | 386 d = comp.sys.ibm.pc.hardware
> > >>> 1 4 2 13 347 3 5 1 0
> > >>> 0 1 0 7 1 0 0 0 1 0
> > >>> 0 | 386 e = comp.sys.mac.hardware
> > >>> 0 20 0 4 0 352 4 0 0
> > >>> 0 0 0 1 1 3 0 1 0 1
> > >>> 0 | 387 f = comp.windows.x
> > >>> 0 2 0 21 5 1 323 7 2
> > >>> 2 0 2 12 0 3 0 0 0 0
> > >>> 1 | 381 g = misc.forsale
> > >>> 0 1 0 0 1 0 15 363 8
> > >>> 1 0 0 4 1 0 0 0 1 0
> > >>> 1 | 396 h = rec.autos
> > >>> 0 1 0 0 0 0 6 6 370
> > >>> 0 0 0 0 1 0 0 0 0 1
> > >>> 0 | 385 i = rec.motorcycles
> > >>> 1 0 0 1 1 0 2 1 2
> > >>> 362 5 0 2 0 0 0 0 0 0
> > >>> 0 | 377 j = rec.sport.baseball
> > >>> 0 0 0 1 2 0 0 0 0
> > >>> 3 371 0 0 0 0 0 0 0 0
> > >>> 1 | 378 k = rec.sport.hockey
> > >>> 0 3 1 0 1 0 2 0 0
> > >>> 0 0 396 0 1 0 0 1 1 1
> > >>> 3 | 410 l = sci.crypt
> > >>> 0 7 0 7 7 2 6 4 0
> > >>> 0 0 1 369 2 2 0 0 0 0
> > >>> 2 | 409 m = sci.electronics
> > >>> 0 3 0 2 1 0 2 0 0
> > >>> 0 0 1 4 383 4 0 0 1 0
> > >>> 4 | 405 n = sci.med
> > >>> 0 5 0 0 1 0 3 0 0
> > >>> 0 0 0 1 0 374 1 0 0 1
> > >>> 1 | 387 o = sci.space
> > >>> 6 2 0 1 1 0 0 1 0
> > >>> 1 0 0 1 5 0 352 2 1 7
> > >>> 1 | 381 p = soc.religion.christian
> > >>> 1 1 0 0 0 0 0 0 0
> > >>> 0 1 0 0 0 0 0 373 1 0
> > >>> 1 | 378 q = talk.politics.mideast
> > >>> 0 0 0 0 0 0 1 0 1
> > >>> 0 0 2 0 0 0 0 0 346 2
> > >>> 7 | 359 r = talk.politics.guns
> > >>> 26 1 0 1 0 0 0 2 0
> > >>> 1 1 0 0 1 1 20 2 6 200
> > >>> 7 | 269 s = talk.religion.misc
> > >>> 1 0 0 0 0 0 0 2 0
> > >>> 0 1 0 0 2 2 0 1 14 0
> > >>> 286 | 309 t = talk.politics.misc
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa 0.8726
> > >>> Accuracy 90.1019%
> > >>> Reliability 85.4491%
> > >>> Reliability (standard deviation) 0.2222
> > >>>
> > >>> Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 10878 ms (Minutes: 0.1813)
> > >>>
> > >>> *SGD:*
> > >>> 7532 test files
> > >>>
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances : 5649 75%
> > >>> Incorrectly Classified Instances : 1883 25%
> > >>> Total Classified Instances : 7532
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a b c d e f g h i
> > >>> j k l m n o p q r s
> > >>> t <--Classified as
> > >>> 186 6 3 10 5 0 33 4 13
> > >>> 15 7 1 24 15 3 15 5 5 29
> > >>> 15 | 394 a = sci.space
> > >>> 5 309 0 3 2 5 0 0 0
> > >>> 1 9 21 2 0 0 18 4 4 1
> > >>> 1 | 385 b = comp.sys.mac.hardware
> > >>> 4 1 101 3 0 1 63 0 7
> > >>> 0 1 1 5 16 3 0 3 7 1
> > >>> 34 | 251 c = talk.religion.misc
> > >>> 11 12 1 265 1 10 3 0 0
> > >>> 17 10 11 5 2 0 11 3 6 21
> > >>> 0 | 389 d = comp.graphics
> > >>> 2 1 1 0 349 2 3 0 3
> > >>> 2 6 1 5 1 0 2 15 2 1
> > >>> 2 | 398 e = rec.motorcycles
> > >>> 7 20 3 19 2 254 6 0 2
> > >>> 11 2 39 7 2 0 4 2 2 9
> > >>> 3 | 394 f = comp.os.ms-windows.misc
> > >>> 2 1 13 0 0 0 247 0 1
> > >>> 1 3 0 6 2 4 0 2 3 5
> > >>> 29 | 319 g = alt.atheism
> > >>> 1 1 0 0 2 0 2 361 0
> > >>> 1 2 0 2 0 0 1 3 22 0
> > >>> 1 | 399 h = rec.sport.hockey
> > >>> 3 0 3 1 0 0 5 0 161
> > >>> 0 1 2 12 102 0 0 1 2 11
> > >>> 6 | 310 i = talk.politics.misc
> > >>> 2 8 0 19 0 19 0 0 1
> > >>> 294 10 11 4 2 0 5 0 3 11
> > >>> 6 | 395 j = comp.windows.x
> > >>> 2 10 0 1 1 0 0 0 0
> > >>> 1 347 13 2 1 0 5 3 2 2
> > >>> 0 | 390 k = misc.forsale
> > >>> 1 36 0 6 1 25 0 0 1
> > >>> 6 10 257 2 1 0 34 6 0 6
> > >>> 0 | 392 l = comp.sys.ibm.pc.hardware
> > >>> 2 2 2 2 1 0 12 0 0
> > >>> 6 10 4 312 5 2 13 11 3 3
> > >>> 6 | 396 m = sci.med
> > >>> 2 0 3 2 1 0 0 1 13
> > >>> 0 5 1 2 314 2 0 2 2 10
> > >>> 4 | 364 n = talk.politics.guns
> > >>> 1 0 2 1 1 0 34 1 33
> > >>> 1 3 0 1 8 271 1 4 5 6
> > >>> 3 | 376 o = talk.politics.mideast
> > >>> 3 14 0 8 2 8 3 1 1
> > >>> 7 12 29 6 2 1 245 13 2 32
> > >>> 4 | 393 p = sci.electronics
> > >>> 3 3 0 2 11 0 1 0 2
> > >>> 1 11 6 4 2 0 11 330 4 4
> > >>> 1 | 396 q = rec.autos
> > >>> 0 0 1 0 1 0 4 12 3
> > >>> 1 3 0 0 0 0 5 6 359 1
> > >>> 1 | 397 r = rec.sport.baseball
> > >>> 0 1 0 0 0 1 0 0 3
> > >>> 3 0 0 3 2 1 6 1 6 366
> > >>> 3 | 396 s = sci.crypt
> > >>> 0 2 11 1 1 0 40 0 1
> > >>> 2 3 4 2 1 0 5 0 2 2
> > >>> 321 | 398 t = soc.religion.christian
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa 0.7073
> > >>> Accuracy 75%
> > >>> Reliability 70.6238%
> > >>> Reliability (standard deviation) 0.2187
> > >>> Log-likelihood mean : -1.1182
> > >>> 25%-ile : -1.6911
> > >>> 75%-ile : -0.0803
> > >>>
> > >>> Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 10783 ms (Minutes: 0.17971666666666666)
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <su...@yahoo.com>wrote:
> > >>>
> > >>>> Thanks Andrew for reporting that. I rolled back the release to fix this
> > >>>> and few other issues.
> > >>>>
> > >>>> We have removed asf-examples*.sh from trunk as the sample file at the
> > >>>> url mentioned in ur email is not available.
> > >>>> This is something we need to fix and restore in 1.0.
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <
> > >>>> ap.dev@outlook.com> wrote:
> > >>>>
> > >>>> from the asf-email-examples.sh script:
> > >>>>
> > >>>> # You will need to download or otherwise obtain some or all of the
> > >>>> Amazon ASF Em
> > >>>> ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566)
> > >>>> to use this
> > >>>> script.
> > >>>> # To obtain a full copy you will need to launch an EC2 instance and
> > >>>> mount the da
> > >>>> taset to download it, otherwise you can get a sample of it at
> > >>>> #
> > >>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
> > >>>>
> > >>>> It looks like the:
> > >>>>
> > >>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
> > >>>>
> > >>>> link is down.
> > >>>>
> > >>>> Is there somewhere else that we can get a subset of the ASF emails?
> > >>>>
> > >>>>
> > >>>>
> > >>>> Date: Tue, 21 Jan 2014 09:48:06 -0800
> > >>>> > Subject: Re: MAHOUT 0.9 Release - New URL
> > >>>> > From: andrew.musselman@gmail.com
> > >>>> > To: dev@mahout.apache.org
> > >>>> >
> > >>>> > Sure thing; continuing to smoke test the other examples tonight
> > >>>> >
> > >>>> >
> > >>>> > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <
> > >>>> suneel_marthi@yahoo.com>wrote:
> > >>>> >
> > >>>> > > Thanks Andrew M., see that some of the example scripts need to be
> > >>>> fixed as
> > >>>> > > they still refer to the deprecated algorithms.
> > >>>> > > See that the Streaming KMeans has failed for you as well.
> > >>>> > >
> > >>>> > > I'll be rolling back the release today to fix these issues.
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
> > >>>> > > andrew.musselman@gmail.com> wrote:
> > >>>> > >
> > >>>> > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default
> > >>>> 64-bit
> > >>>> > > Linux AMI from tarball.
> > >>>> > >
> > >>>> > > All tests pass.
> > >>>> > >
> > >>>> > > *Output of examples:*
> > >>>> > > *asf-email-examples.sh, run on mahout.apache.org
> > >>>> > > <http://mahout.apache.org>:*
> > >>>> > > *recommendations:*
> > >>>> > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
> > >>>> > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000 | less
> > >>>> > > 1
> > >>>> > >
> > >>>> > >
> > >>>> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
> > >>>> > > 4
> > >>>> > >
> > >>>> > >
> > >>>> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
> > >>>> > > 6
> > >>>> > >
> > >>>> > >
> > >>>> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
> > >>>> > > 8
> > >>>> > > [12758:1.0,19409:1.0,11112:1.0]
> > >>>> > > 11
> > >>>> > >
> > >>>> > >
> > >>>> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
> > >>>> > > 14
> > >>>> > >
> > >>>> > >
> > >>>> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
> > >>>> > > 15
> > >>>> > >
> > >>>> > >
> > >>>> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
> > >>>> > > 16
> > >>>> > >
> > >>>> > >
> > >>>> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
> > >>>> > > 18
> > >>>> > >
> > >>>> > >
> > >>>> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
> > >>>> > > 19 [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
> > >>>> > > 20
> > >>>> > >
> > >>>> > >
> > >>>> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
> > >>>> > > [snip]
> > >>>> > >
> > >>>> > > *clustering; kmeans:*
> > >>>> > > [snip]
> > >>>> > > Weight : [props - optional]: Point:
> > >>>> > > 1.0 :
> > >>>> > > [distance-squared=1.0193102046188427]:
> > >>>> > > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus=
> > >>>> > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
> > >>>> 7573:0.204,
> > >>>> > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093,
> > >>>> 9779:0.159,
> > >>>> > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
> > >>>> > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
> > >>>> > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
> > >>>> > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
> > >>>> > > 39789:0.110, 40743:0.190, 45775:0.086]
> > >>>> > > 1.0 : [distance-squared=0.9823018320457279]:
> > >>>> > > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus=
> > >>>> > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
> > >>>> 5336:0.106,
> > >>>> > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173,
> > >>>> 7832:0.072,
> > >>>> > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
> > >>>> > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
> > >>>> > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
> > >>>> > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
> > >>>> > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
> > >>>> > > 1.0 : [distance-squared=0.9509142993214911]:
> > >>>> > >
> > >>>> /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
> > >>>> > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
> > >>>> > > 4419:0.076,
> > >>>> > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056,
> > >>>> 7235:0.048,
> > >>>> > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123,
> > >>>> 7683:0.077,
> > >>>> > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
> > >>>> 10225:0.081,
> > >>>> > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
> > >>>> > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
> > >>>> > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
> > >>>> > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
> > >>>> > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
> > >>>> > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
> > >>>> > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
> > >>>> > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
> > >>>> > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
> > >>>> > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
> > >>>> > > 41280:0.065, 41696:0.072, 41947:0.118,
> > >>>> > > 43685:0.086, 44077:0.308,
> > >>>> > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
> > >>>> > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
> > >>>> > > [snip]
> > >>>> > >
> > >>>> > > *clustering; dirichlet:*
> > >>>> > > Get this complaint:
> > >>>> > > Running Dirichlet with K = 8
> > >>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > >>>> > > HADOOP_CONF_DIR=
> > >>>> > > MAHOUT-JOB:
> > >>>> > >
> > >>>> > >
> > >>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > >>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class:
> > >>>> dirichlet
> > >>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props
> > >>>> found on
> > >>>> > > classpath, will use command-line arguments only
> > >>>> > > Unknown program 'dirichlet' chosen.
> > >>>> > >
> > >>>> > > *clustering: minhash:*
> > >>>> > > Running Minhash
> > >>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > >>>> > > HADOOP_CONF_DIR=
> > >>>> > > MAHOUT-JOB:
> > >>>> > >
> > >>>> > >
> > >>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > >>>> > > 14/01/21 05:17:27 WARN
> > >>>> > > driver.MahoutDriver: Unable to add class: minhash
> > >>>> > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found
> > >>>> on
> > >>>> > > classpath, will use command-line arguments only
> > >>>> > > Unknown program 'minhash' chosen.
> > >>>> > >
> > >>>> > > *classification; standard:*
> > >>>> > > =======================================================
> > >>>> > > Summary
> > >>>> > > -------------------------------------------------------
> > >>>> > > Correctly Classified Instances : 5384 87.7874%
> > >>>> > > Incorrectly Classified Instances : 749 12.2126%
> > >>>> > > Total Classified Instances : 6133
> > >>>> > >
> > >>>> > > =======================================================
> > >>>> > > Confusion Matrix
> > >>>> > > -------------------------------------------------------
> > >>>> > > a b c d
> > >>>> > > <--Classified as
> > >>>> > > 2949 7 531 25 | 3512 a = dev
> > >>>> > > 0 0 0 0 | 0 b = general
> > >>>> > > 99 8 1763 8 | 1878 c = user
> > >>>> > > 41 1 29 672 | 743 d = commits
> > >>>> > >
> > >>>> > > =======================================================
> > >>>> > > Statistics
> > >>>> > > -------------------------------------------------------
> > >>>> > > Kappa
> > >>>> > > 0.7877
> > >>>> > > Accuracy 87.7874%
> > >>>> > > Reliability 53.658%
> > >>>> > > Reliability (standard deviation) 0.4911
> > >>>> > >
> > >>>> > > *classification; complementary:*
> > >>>> > > =======================================================
> > >>>> > > Summary
> > >>>> > > -------------------------------------------------------
> > >>>> > > Correctly Classified Instances : 5530 90.1679%
> > >>>> > > Incorrectly Classified Instances : 603 9.8321%
> > >>>> > > Total Classified Instances :
> > >>>> > > 6133
> > >>>> > >
> > >>>> > > =======================================================
> > >>>> > > Confusion Matrix
> > >>>> > > -------------------------------------------------------
> > >>>> > > a b c d <--Classified as
> > >>>> > > 3168 0 276 68 | 3512 a = dev
> > >>>> > > 0 0 0 0 | 0 b = general
> > >>>> > > 196 0 1652 30 | 1878 c = user
> > >>>> > > 25 0 8 710 | 743 d =
> > >>>> > > commits
> > >>>> > >
> > >>>> > > =======================================================
> > >>>> > > Statistics
> > >>>> > > -------------------------------------------------------
> > >>>> > > Kappa 0.8259
> > >>>> > > Accuracy 90.1679%
> > >>>> > > Reliability 54.7459%
> > >>>> > > Reliability (standard deviation) 0.5005
> > >>>> > >
> > >>>> > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms
> > >>>> (Minutes:
> > >>>> > > 0.34836666666666666)
> > >>>> > >
> > >>>> > > *classification; sgd, with three categories:*
> > >>>> > > Running SGD Training
> > >>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
> > >>>> > > and
> > >>>> > > HADOOP_CONF_DIR=
> > >>>> > > MAHOUT-JOB:
> > >>>> > >
> > >>>> > >
> > >>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > >>>> > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
> > >>>> > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
> > >>>> classpath,
> > >>>> > > will use command-line arguments only
> > >>>> > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
> > >>>> > > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
> > >>>> > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
> > >>>> > > --output=[asf-output/classification/sgd/models], --poolSize=[5],
> > >>>> > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
> > >>>> > > 24168 training files
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 1
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> > >>>> > > 2
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 3
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 4
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 6
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 8
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 10
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00
> > >>>> > > 0.00 0.00 0.0000000 0.0000000 12
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 15
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 20
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 25
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 30
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000
> > >>>> > > 0.0000000 40
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 50
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 60
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 70
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 80
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 100
> > >>>> > > 0.000
> > >>>> > > 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 120
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 140
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 150
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 200
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 250
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00
> > >>>> > > 0.00 0.00 0.0000000 0.0000000 300
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 400
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 500
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 600
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 700
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000
> > >>>> > > 0.0000000 800
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > >>>> > > 1.0019413e-08 1000 -0.607 75.78 none
> > >>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > >>>> > > 1.0019413e-08 1200 -0.607 75.78 none
> > >>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > >>>> > > 1.0019413e-08 1400 -0.607 75.78 none
> > >>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > >>>> > > 1.0019413e-08 1500 -0.607 75.78 none
> > >>>> > > 0.24 43686.00 17924.00 329.50
> > >>>> > > 1.0571799e-08
> > >>>> > > 1.0032261e-08 2000 -0.487 82.65 none
> > >>>> > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
> > >>>> > > 1.0011902e-08 2500 -0.439 83.90 none
> > >>>> > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
> > >>>> > > 1.0011902e-08 3000 -0.439 83.90 none
> > >>>> > > 0.32 50635.00 28531.00 437.09 1.0551175e-08
> > >>>> > > 1.0000001e-08 4000 -0.351 88.14 none
> > >>>> > > 0.32 50635.00 32642.00 437.09 1.0551175e-08
> > >>>> > > 1.0000000e-08 5000 -0.378 87.10 none
> > >>>> > > 0.32 50635.00 36461.00 437.09
> > >>>> > > 1.0556652e-08
> > >>>> > > 1.0000001e-08 6000 -0.372 86.89 none
> > >>>> > > 0.32 50635.00 37768.00 437.09 1.0576742e-08
> > >>>> > > 1.0000001e-08 7000 -0.334 89.26 none
> > >>>> > > 0.32 50635.00 38807.00 437.09 1.0576742e-08
> > >>>> > > 1.0000000e-08 8000 -0.368 87.52 none
> > >>>> > > 0.32 50635.00 44731.00 437.09 1.0576716e-08
> > >>>> > > 1.0000000e-08 10000 -0.374 87.39 none
> > >>>> > > 0.32 50635.00 45672.00 437.09 1.0576716e-08
> > >>>> > > 1.0000000e-08 12000 -0.298 88.26 none
> > >>>> > > Exception in thread "main" java.lang.IllegalStateException:
> > >>>> > > java.lang.ArrayIndexOutOfBoundsException:
> > >>>> > > 2
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
> > >>>> > > at
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
> > >>>> > > at
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
> > >>>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > >>>> Method)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > >>>> > >
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > >>>> > > at java.lang.reflect.Method.invoke(Method.java:622)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > >>>> > > at
> > >>>> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > >>>> > > at
> > >>>> > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> > >>>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > >>>> Method)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > >>>> > > at java.lang.reflect.Method.invoke(Method.java:622)
> > >>>> > > at
> > >>>> > > org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> > >>>> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
> > >>>> > > at
> > >>>> > > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
> > >>>> > >
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
> > >>>> > > at
> > >>>> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > >>>> > > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >>>> > > at java.lang.Thread.run(Thread.java:701)
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
> > >>>> > > andrew.musselman@gmail.com> wrote:
> > >>>> > >
> > >>>> > > > Trying out the build today
> > >>>> > > >
> > >>>> > > >
> > >>>> > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
> > >>>> suneel_marthi@yahoo.com
> > >>>> > > >wrote:
> > >>>> > > >
> > >>>> > > >> This is an issue (trivial one though) that needs to be fixed for
> > >>>> 0.9
> > >>>> > > >> Release, will be rerolling the release today (in the next few
> > >>>> hrs) and
> > >>>> > > >> putting out a new release candidate in staging.
> > >>>> > > >>
> > >>>> > > >> Thanks for reporting this Andrew P.
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
> > >>>> > > ap.dev@outlook.com>
> > >>>> > > >> wrote:
> > >>>> > > >>
> > >>>> > > >> I ran through the tests with on a CentOS VM
> > >>>> > > AMD64 2 cores 4 GB RAM. Had
> > >>>> > > >> a bit of trouble getting the Hadoop natives to compile and
> > >>>> therefore may
> > >>>> > > >> have run into some problems because of the hadoop setup. Ran
> > >>>> into some
> > >>>> > > >> problems in the example scripts. Particularly with
> > >>>> > > >> ./cluster-syntheticcontrol.sh ->4,5. I will run through the
> > >>>> rest of the
> > >>>> > > >> examples when im sure I've got hadoop setup right.
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> Apache Maven 3.1.2-SNAPSHOT
> > >>>> > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> > >>>> > > >> Java home: /usr/java/jdk1.6.0_45/jre
> > >>>> > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch:
> > >>>> "amd64",
> > >>>> > > >> family: "unix"
> > >>>> > > >> $MAHOUT_LOCAL=true
> > >>>> > > >> Hadoop 2.2.0
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> a) Verify that u can unpack the release (tar or zip) ...passed
> > >>>> (tar)
> > >>>> > > >> [passed ]
> > >>>> > > >>
> > >>>> > > >> b) Verify u r able to compile the
> > >>>> > > distro
> > >>>> > > >>
> > >>>> > > >> mvn compile- [passed with warnings]
> > >>>> > > >>
> > >>>> > > >> [WARNING] Expected all dependencies to require Scala
> > >>>> version: 2.9.3
> > >>>> > > >> [WARNING] org.apache.mahout:mahout-math-scala:0.9 requires
> > >>>> scala
> > >>>> > > >> version: 2.9.3
> > >>>> > > >> [WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala
> > >>>> > > >> version: 2.9.2
> > >>>> > > >> [WARNING] Multiple versions of scala libraries detected!
> > >>>> > > >>
> > >>>> > > >> c) Run through the unit tests: mvn clean test
> > >>>> > > >> mvn clean test [passed]
> > >>>> > > >>
> > >>>> > > >> d) Run the
> > >>>> > > >> example scripts under $MAHOUT_HOME/examples/bin.
> > >>>> > > >> Please run through all the different options in each script
> > >>>> > > >>
> > >>>> > > >> Running example scripts with $MAHOUT_LOCAL=true
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > ./cluster-syntheticcontrol.sh ->1 [works]
> > >>>> > > >> ./cluster-syntheticcontrol.sh ->2 [works]
> > >>>> > > >> ./cluster-syntheticcontrol.sh ->3 [works]
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
> > >>>> > > >> [...]
> > >>>> > > >> WARNING: Unable to add class:
> > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > >>>> > > >> java.lang.ClassNotFoundException:
> > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > >>>> > > >> at
> > >>>> > > >> java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > >>>> > > >> at java.security.AccessController.doPrivileged(Native
> > >>>> Method)
> > >>>> > > >> at
> > >>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > >>>> > > >> at
> > >>>> > > java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > >>>> > > >> at
> > >>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > >>>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > >>>> > > >> at java.lang.Class.forName0(Native Method)
> > >>>> > > >> at java.lang.Class.forName(Class.java:171)
> > >>>> > > >> at
> > >>>> > > >>
> > >>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > >>>> > > >> at
> > >>>> > > >>
> > >>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > >>>> > > >> Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
> > >>>> > > >>
> > >>>> > > >> WARNING: Unable to add class:
> > >>>> > > >>
> > >>>> > > org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > >>>> > > >> java.lang.ClassNotFoundException:
> > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > >>>> > > >> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > >>>> > > >> at java.security.AccessController.doPrivileged(Native
> > >>>> Method)
> > >>>> > > >> at
> > >>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > >>>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > >>>> > > >> at
> > >>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > >>>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > >>>> > > >> at java.lang.Class.forName0(Native Method)
> > >>>> > > >> at
> > >>>> > > java.lang.Class.forName(Class.java:171)
> > >>>> > > >> at
> > >>>> > > >>
> > >>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > >>>> > > >> at
> > >>>> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > >>>> > > >> Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
> > >>>> > > >> WARNING: No
> > >>>> > > >>
> > >>>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found
> > >>>> > > on
> > >>>> > > >> classpath, will use command-line arguments only
> > >>>> > > >> Unknown program
> > >>>> > > >> 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
> > >>>> chosen.
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> ./classify-20newsgroups.sh ->1 [works]
> > >>>> > > >> ./classify-20newsgroups.sh ->2 [works]
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> cluster-reuters.sh ->1 [works]
> > >>>> > > >>
> > >>>> > > cluster-reuters.sh ->2 [works]
> > >>>> > > >> cluster-reuters.sh ->3 [works]
> > >>>> > > >>
> > >>>> > > >> Same error as noted previosly in the thread:
> > >>>> > > >>
> > >>>> > > >> cluster-reuters.sh ->4 [0 clusters]
> > >>>> > > >>
> > >>>> > > >> [...]
> > >>>> > > >>
> > >>>> > > >> WARNING: No qualcluster.props found on classpath, will use
> > >>>> > > >> command-line arguments only
> > >>>> > > >> Num clusters: 0; maxDistance: 0.000000
> > >>>> > > >> [Dunn Index]
> > >>>> > > >> First: Infinity
> > >>>> > > >> [Davies-Bouldin Index] First: NaN
> > >>>> > > >> Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>>> > > >> INFO: Program took 669 ms (Minutes: 0.01115)
> > >>>> > > >> cluster,distance.mean,distance.sd
> > >>>> > > >>
> > >>>> > >
> > >>>> > >
> > >>>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
> > >>>> > > >> > From: suneel_marthi@yahoo.com
> > >>>> > > >> > Subject: MAHOUT 0.9 Release - New URL
> > >>>> > > >> > To: user@mahout.apache.org; dev@mahout.apache.org
> > >>>> > > >> >
> > >>>> > > >> > Third time's a Charm!!!
> > >>>> > > >> >
> > >>>> > > >> >
> > >>>> > > >> > Here's the new URL for Mahout 0.9 Release:
> > >>>> > > >> >
> > >>>> > > >>
> > >>>> > >
> > >>>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> > >>>> > > >> >
> > >>>> > > >> > For those volunteering to test this, some of the things to be
> > >>>> > > verified:
> > >>>> > > >> >
> > >>>> > > >> > a) Verify that u can unpack the release (tar or zip)
> > >>>> > > >> > b) Verify u r able to compile the distro
> > >>>> > > >> > c) Run through the unit tests: mvn clean test
> > >>>> > > >> > d) Run the example scripts
> > >>>> > > >> under $MAHOUT_HOME/examples/bin. Please run through all the
> > >>>> different
> > >>>> > > >> options in each script.
> > >>>> > > >> >
> > >>>> > > >> >
> > >>>> > > >> > Committers
> > >>>> > > >> > and PMC members:
> > >>>> > > >> > ---------------------------------------
> > >>>> > > >> >
> > >>>> > > >> > Need 'at least 3 +1 votes' for the Release to pass.
> > >>>> > > >> >
> > >>>> > > >> >
> > >>>> > > >> > Thanks and
> > >>>> > > Regards.
> > >>>> > > >>
> > >>>> > > >
> > >>>> > > >
> > >>>> > >
> > >>>>
> > >>>
> > >>>
> > >>
> > >
>
Re: MAHOUT 0.9 Release - New URL
Posted by Suneel Marthi <su...@yahoo.com>.
Thanks Andrew. I'll put a Release out soon.
On Wednesday, January 22, 2014 3:52 PM, Andrew Palumbo <ap...@outlook.com> wrote:
Everything seems to run well on my local machine:
Checked out revision 1560364.
CentOS 6
Apache Maven 3.1.2-SNAPSHOT
Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
Java home: /usr/java/jdk1.6.0_45/jre
OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64", family: "unix"
Hadoop 2.2.0
mvn clean compile -DSkipTests [OK-Several Warnings]
mvn clean test [PASSED ALL]
mvn clean install -DskipTests [OK]
$MAHOUT_LOCAL=true
classify-20newsgroups.sh->1 [Accuracy 89.3529%]
classify-20newsgroups.sh->2 [Accuracy 90.8317%]
classify-20newsgroups.sh->3 [Accuracy 76.2746%]
classify-20newsgroups.sh->4 [cleans up]
cluster-reuters.sh->1 [20 clusters] -kmeans
cluster-reuters.sh->2 [INFO: 20 clusters] -fkmeans
cluster-reuters.sh->3 [OK] -lda
cluster-reuters.sh->4 [10 (9) clusters- see attached] -streaming kmeans
./cluster-syntheticcontrol.sh->1 [INFO: Wrote 6 clusters]
./cluster-syntheticcontrol.sh->2 [INFO: Wrote 6 clusters]
./cluster-syntheticcontrol.sh->3 [INFO: Wrote 6 clusters]
./factorize-movielens-1M.sh /home/andy/test_data/ml-1m/ratings.dat [RMSE is: 0.851264570339848]
Attached is full output of cluster-reuters.sh->4 Streaming K-Means.
From cluster-reuters.sh->4 Streaming K-Means:
Cluster 0 is has 1 data point. Need atleast 2 data points in a cluster for OnlineSummarizer.
Average distance in cluster 1 [2816]: 3438.913758
Average distance in cluster 2 [112]: 20617.345993
Average distance in cluster 3 [4]: 32504.085379
Average distance in cluster 4 [435]: 18476.579935
Average distance in cluster 5 [27]: 21153.167574
Average distance in cluster 6 [15480]: 2040.864416
Average distance in cluster 7 [1711]: 5281.742482
Average distance in cluster 8 [964]: 15762.976239
Average distance in cluster 9 [28]: 19762.109632
Num clusters: 10; maxDistance: 107106.379648
[Dunn Index] First: 0.002272
[Davies-Bouldin Index] First: 57.871266
Jan 22, 2014 12:14:47 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 77695 ms (Minutes: 1.2949166666666667)
cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
1,3438.913758,2430.072640,250.635051,1793.254765,2908.356638,4444.702564,22173.892767,2816,train
2,20617.345993,3978.577827,-8306.835555,17787.685767,19584.319120,22864.637511,37305.829397,112,train
3,32504.085379,29250.558538,-12174.296092,12174.296092,36522.888276,26372.137172,107106.379648,4,train
4,18476.579935,3600.742072,-7212.729374,15841.995992,17431.838259,20066.610494,40205.090209,435,train
5,21153.167574,4963.661797,-8880.583978,19729.348269,21251.400944,24588.743549,27926.248558,27,train
6,2040.864416,2007.719699,53.622493,841.033934,1571.121917,2396.407672,18967.768820,15480,train
7,5281.742482,3083.071478,1933.759989,3216.929268,4074.689928,6371.577109,20292.193673,1711,train
8,15762.976239,3158.956443,65.031208,13511.867700,14744.029626,17287.006957,31483.809655,964,train
9,19762.109632,4355.120345,-8902.814641,18669.317253,20712.227220,21602.660490,27452.910312,28,train
> From: ap.dev@outlook.com
> To: dev@mahout.apache.org; user@mahout.apache.org
> Subject: RE: MAHOUT 0.9 Release - New URL
> Date: Wed, 22 Jan 2014 09:37:06 -0500
>
> will do!
>
> > Date: Wed, 22 Jan 2014 01:24:05 -0800
> > From: suneel_marthi@yahoo.com
> > Subject: Re: MAHOUT 0.9 Release - New URL
> > To: dev@mahout.apache.org; user@mahout.apache.org
> >
> > Andrew M., Andrew P. and others,
> >
> > Sebastian and me fixed a few issues today (for 0.9):
> >
> > a) Removed asf-email-examples.sh script and few other scripts that should have been removed. Also removed references/invocations to algorithms that have been removed from the codebase.
> > b) Fixed the issue with Streaming Kmeans clustering and checked in the code.
> > c) Resurrected Frequent Pattern Mining implementation for 0.9.
> >
> > Please checkout the latest code from trunk, run a build locally and run thru the example scripts.
> >
> > Thanks and Regards.
> >
> >
> >
> >
> >
> >
> > On Wednesday, January 22, 2014 12:11 AM, Andrew Musselman <an...@gmail.com> wrote:
> >
> > *factorize-movielens-1M.sh:*
> > RMSE is:
> >
> > 0.8519064098265133
> >
> >
> > Sample recommendations:
> >
> > 2229
> > [2197:4.4961276,527:4.4309845,2972:4.4180074,318:4.379484,572:4.312696,3092:4.249903]
> > 5848
> > [1900:4.6775646,3787:4.6623707,632:4.641377,2609:4.608225,3808:4.6058936,2998:4.6057487]
> > 3728
> > [572:4.951382,47:4.748921,874:4.6945343,1563:4.679901,3314:4.6621537,50:4.655838]
> > 1252
> > [53:5.0,3816:4.9664702,3077:4.9494777,213:4.94007,3808:4.9060082,978:4.8568053]
> > 634
> > [572:5.0,3092:4.779557,1872:4.72024,2687:4.629712,2125:4.615142,3853:4.5261393]
> > 5516 [572:5.0,2197:5.0,3092:5.0,318:4.908213,356:4.885,3844:4.8237453]
> > 2276 [1204:5.0,572:5.0,912:5.0,1250:5.0,1272:4.999891,1262:4.989652]
> > 4219
> > [53:4.8112006,598:4.775032,858:4.761604,572:4.7579737,1219:4.680987,1221:4.6604886]
> > 91 [1198:5.0,2762:5.0,1207:5.0,1234:5.0,318:5.0,260:5.0]
> > 502
> > [953:5.0,260:4.9800477,1234:4.869403,1198:4.8527064,1207:4.8497486,3469:4.847286]
> >
> > factorize-netflix.sh:
> > References a no-longer-available data set that Netflix took down after the
> > competition; should at least mention that the data set is no longer
> > "online" at least.
> >
> >
> > On Tue, Jan 21, 2014 at 8:05 PM, Andrew Musselman <
> > andrew.musselman@gmail.com> wrote:
> >
> > > *clustering-syntheticcontrol.sh*
> > >
> > > *Canopy:*
> > > [snip]
> > > 1.0 : [distance-squared=1740.681000315628]: [35.486, 25.600,
> > > 29.914, 30.200, 27.654, 28.647, 26.582, 32.941, 34.728, 29.047, 34.706,
> > > 34.764, 28.816, 30.271, 25.784, 25.035, 35.436, 29.075, 24.267, 24.625,
> > > 28.162, 26.218, 28.773, 28.966, 28.802, 34.292, 26.211, 33.363, 32.920,
> > > 31.891, 34.504, 32.686, 24.327, 35.981, 31.390, 10.832, 20.238, 10.051,
> > > 14.877, 10.570, 19.603, 14.544, 10.667, 16.470, 19.007, 10.352, 13.473,
> > > 12.196, 10.684, 16.620, 20.434, 17.069, 18.744, 9.599, 11.195, 12.002,
> > > 10.017, 17.149, 14.850, 10.890]
> > > 1.0 : [distance-squared=1455.363773097357]: [31.022, 28.140,
> > > 26.730, 26.570, 29.561, 26.966, 28.049, 25.673, 33.721, 26.275, 30.410,
> > > 31.101, 24.019, 35.659, 25.253, 25.932, 28.618, 32.423, 33.666, 33.745,
> > > 35.118, 29.164, 25.477, 31.947, 35.491, 30.730, 25.820, 24.651, 25.528,
> > > 31.343, 29.005, 31.825, 26.891, 28.194, 31.429, 16.935, 8.070, 16.604,
> > > 14.743, 10.342, 8.155, 10.395, 17.689, 16.791, 14.138, 15.761, 6.787,
> > > 13.062, 16.660, 15.021, 9.891, 9.216, 11.550, 8.877, 18.220, 9.477, 10.342,
> > > 16.430, 11.898, 15.366]
> > > 1.0 : [distance-squared=1679.9304895378882]: [29.625, 25.503,
> > > 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> > > 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> > > 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> > > 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> > > 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> > > 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> > > 15.285, 22.528, 20.657, 24.129]
> > > 1.0 : [distance-squared=2044.2887801683828]: [27.414, 25.397,
> > > 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> > > 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> > > 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> > > 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> > > 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> > > 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> > > 20.229, 11.131, 9.980, 10.720]
> > > 1.0 : [distance-squared=1385.3154063160764]: [35.899, 26.672,
> > > 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> > > 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> > > 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> > > 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> > > 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> > > 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> > > 16.546, 15.927, 18.084, 17.475]
> > > 1.0 : [distance-squared=1920.6376615603585]: [24.538, 24.280,
> > > 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> > > 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> > > 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> > > 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> > > 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> > > 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> > > 19.310, 12.999, 17.460]
> > > 1.0 : [distance-squared=2192.939571172661]: [34.335, 30.938,
> > > 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> > > 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> > > 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> > > 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> > > 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> > > 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> > > 11.743, 11.699, 10.152]
> > > Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Wrote 6 clusters
> > > Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Program took 5827 ms (Minutes: 0.09711666666666667)
> > >
> > > *K-means:*
> > > [snip]
> > > 1.0 : [distance-squared=2873.881301031739]: [26.369, 37.791,
> > > 41.839, 39.694, 36.728, 35.079, 30.668, 24.755, 20.610, 18.885, 15.459,
> > > 14.074, 19.117, 34.230, 32.202, 40.715, 39.543, 37.193, 39.448, 30.829,
> > > 25.273, 22.324, 19.801, 13.505, 17.462, 24.287, 27.398, 38.577, 42.108,
> > > 41.947, 43.987, 41.331, 28.958, 23.664, 20.308, 22.244, 12.149, 15.768,
> > > 18.964, 32.579, 33.202, 36.205, 42.364, 40.601, 35.415, 39.576, 33.145,
> > > 19.936, 23.062, 19.053, 24.383, 19.611, 25.218, 38.860, 36.570, 38.964,
> > > 36.946, 36.900, 32.593, 31.563]
> > > 1.0 : [distance-squared=2525.8924241648783]: [35.389, 31.178,
> > > 40.041, 43.034, 49.524, 40.942, 42.369, 30.153, 26.253, 23.178, 19.885,
> > > 19.693, 21.837, 26.858, 23.533, 29.798, 43.401, 43.349, 48.238, 43.868,
> > > 44.017, 28.056, 25.284, 22.737, 13.703, 14.023, 22.115, 23.720, 25.478,
> > > 37.870, 37.868, 46.954, 43.135, 41.286, 37.269, 29.347, 24.312, 21.743,
> > > 19.152, 21.668, 10.882, 16.147, 30.020, 28.472, 38.716, 44.620, 47.847,
> > > 48.846, 39.361, 38.449, 33.105, 19.935, 14.961, 12.149, 12.630, 13.459,
> > > 18.967, 35.473, 30.146, 45.527]
> > > 1.0 : [distance-squared=2392.7171990886272]: [27.662, 37.199,
> > > 39.158, 44.264, 46.473, 40.178, 38.728, 24.412, 25.155, 15.938, 13.125,
> > > 16.867, 13.875, 29.969, 34.322, 40.870, 44.225, 47.594, 47.607, 44.751,
> > > 39.813, 32.461, 16.646, 16.305, 21.256, 20.627, 19.267, 31.901, 34.995,
> > > 43.643, 43.152, 47.125, 48.524, 33.131, 32.697, 20.300, 14.350, 18.330,
> > > 12.155, 11.261, 19.701, 29.313, 30.457, 39.914, 49.543, 49.851, 46.445,
> > > 39.561, 28.860, 22.486, 13.967, 10.006, 11.715, 15.945, 21.348, 25.439,
> > > 41.024, 37.105, 45.623, 43.589]
> > > 1.0 : [distance-squared=1419.8378244373016]: [25.784, 34.129,
> > > 42.659, 37.176, 35.961, 34.307, 32.108, 29.749, 25.047, 17.455, 24.652,
> > > 25.311, 22.995, 30.256, 25.955, 28.426, 34.556, 40.386, 39.642, 40.566,
> > > 32.612, 34.091, 26.033, 25.668, 26.545, 17.338, 24.980, 20.134, 27.542,
> > > 36.612, 31.855, 37.947, 39.736, 33.535, 36.607, 37.479, 32.612, 22.262,
> > > 20.662, 16.124, 24.547, 27.686, 21.747, 27.198, 31.259, 40.569, 37.067,
> > > 34.465, 34.730, 33.371, 23.060, 30.162, 22.022, 22.216, 14.812, 19.357,
> > > 24.508, 34.432, 32.155, 34.839]
> > > 1.0 : [distance-squared=4186.814512311335]: [25.870, 39.195,
> > > 36.908, 47.052, 47.384, 40.741, 42.494, 30.282, 25.834, 17.650, 16.004,
> > > 17.895, 13.321, 19.045, 27.440, 31.911, 39.208, 43.622, 41.567, 44.815,
> > > 44.921, 35.422, 35.477, 23.190, 17.859, 14.684, 23.504, 23.141, 21.746,
> > > 30.816, 31.361, 37.015, 38.094, 46.688, 47.681, 43.777, 39.652, 31.701,
> > > 23.767, 22.265, 22.654, 22.327, 19.195, 21.163, 29.602, 27.563, 36.244,
> > > 38.859, 44.234, 42.352, 42.160, 40.172, 30.094, 21.092, 25.193, 13.096,
> > > 18.111, 14.754, 27.386, 27.026]
> > > 1.0 : [distance-squared=1544.4011543572997]: [28.075, 41.784,
> > > 42.120, 38.735, 44.320, 34.316, 32.212, 31.868, 24.301, 14.547, 17.178,
> > > 22.279, 24.357, 31.011, 31.444, 34.837, 46.550, 48.301, 38.859, 42.363,
> > > 35.657, 31.499, 19.794, 12.124, 15.371, 12.436, 15.763, 24.679, 32.597,
> > > 43.004, 36.616, 38.935, 42.954, 34.957, 36.183, 28.177, 16.326, 14.988,
> > > 10.680, 22.728, 24.075, 24.058, 36.616, 43.982, 39.198, 40.118, 40.078,
> > > 34.752, 34.018, 23.750, 18.374, 12.251, 15.539, 18.699, 28.973, 24.044,
> > > 39.404, 38.034, 46.458, 44.432]
> > > 1.0 : [distance-squared=825.9338725427806]: [33.670, 38.675,
> > > 39.742, 41.989, 37.291, 43.975, 31.909, 25.878, 31.080, 15.858, 13.950,
> > > 23.097, 19.983, 21.692, 31.579, 38.570, 33.376, 38.843, 41.936, 33.534,
> > > 39.195, 32.897, 25.343, 18.523, 15.089, 17.771, 22.614, 25.313, 23.687,
> > > 29.010, 41.995, 35.712, 40.872, 41.669, 32.156, 25.162, 24.980, 23.705,
> > > 18.413, 20.975, 14.906, 26.171, 30.165, 27.818, 35.083, 39.514, 37.851,
> > > 33.967, 32.338, 34.977, 26.589, 28.079, 19.597, 24.669, 23.098, 25.685,
> > > 28.215, 34.940, 36.910, 39.749]
> > > Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Wrote 6 clusters
> > > Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Program took 16902 ms (Minutes: 0.2817)
> > >
> > > *Fuzzy k-means:*
> > > [snip]
> > > 1.0 : [distance-squared=971.7369782121968]: [29.625, 25.503,
> > > 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> > > 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> > > 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> > > 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> > > 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> > > 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> > > 15.285, 22.528, 20.657, 24.129]
> > > 1.0 : [distance-squared=2054.618163154475]: [27.414, 25.397,
> > > 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> > > 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> > > 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> > > 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> > > 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> > > 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> > > 20.229, 11.131, 9.980, 10.720]
> > > 1.0 : [distance-squared=954.6503560728597]: [35.899, 26.672,
> > > 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> > > 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> > > 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> > > 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> > > 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> > > 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> > > 16.546, 15.927, 18.084, 17.475]
> > > 1.0 : [distance-squared=2817.9170498632957]: [24.538, 24.280,
> > > 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> > > 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> > > 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> > > 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> > > 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> > > 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> > > 19.310, 12.999, 17.460]
> > > 1.0 : [distance-squared=3472.3684696871424]: [34.335, 30.938,
> > > 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> > > 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> > > 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> > > 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> > > 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> > > 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> > > 11.743, 11.699, 10.152]
> > > Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Wrote 6 clusters
> > > Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Program took 106615 ms (Minutes: 1.7769166666666667)
> > >
> > > *Dirichlet and Meanshift:*
> > > Already detailed in M-1400, deprecated jobs still referenced.
> > >
> > >
> > >
> > > On Tue, Jan 21, 2014 at 6:20 PM, Andrew Musselman <
> > > andrew.musselman@gmail.com> wrote:
> > >
> > >> *cluster-reuters.sh*
> > >> *k-means:*
> > >>
> > >> [snip]
> > >> :VL-19482{n=913 c=[0.06:0.011, 0.1:0.007, 0.13:0.010, 0.25:0.016,
> > >> 0.38:0.020, 0.4:0.007, 0.5:0.032, 0
> > >> Top Terms:
> > >> banks =>
> > >> 3.841823268955143
> > >> bank =>
> > >> 3.80633066361209
> > >> debt =>
> > >> 3.28065219870794
> > >> said =>
> > >> 2.5965700942088583
> > >> he =>
> > >> 2.335682813857497
> > >> foreign =>
> > >> 2.2217853688201403
> > >> billion =>
> > >> 2.1970193848291335
> > >> would =>
> > >> 1.9932392063955617
> > >> loans =>
> > >> 1.9309276792854233
> > >> interest =>
> > >> 1.787324501938
> > >> have =>
> > >> 1.762981951432578
> > >> its =>
> > >> 1.7615109954971866
> > >> which =>
> > >> 1.5822081148036862
> > >> has =>
> > >> 1.5600708189041956
> > >> dlrs =>
> > >> 1.5571038313005996
> > >> finance =>
> > >> 1.5539758811252924
> > >> new =>
> > >> 1.5176015811577555
> > >> had =>
> > >> 1.5138723701401844
> > >> brazil =>
> > >> 1.5083369853593172
> > >> payments =>
> > >> 1.4539044255886517
> > >> Weight : [props - optional]: Point:
> > >>
> > >> :VL-7320{n=2726 c=[0:0.003, 0.1:0.010, 0.2:0.007, 0.3:0.009, 0.4:0.007,
> > >> 0.40:0.003, 0.5:0.009, 0.57:0
> > >> Top Terms:
> > >> vs =>
> > >> 6.126130791333171
> > >> net =>
> > >> 4.012191567277523
> > >> cts =>
> > >> 3.822006848832744
> > >> shr =>
> > >> 3.6786004856764527
> > >> mln =>
> > >> 2.9011643584038698
> > >> loss =>
> > >> 2.788368861463607
> > >> qtr =>
> > >> 2.714140225051522
> > >> revs =>
> > >> 2.4739861236454717
> > >> profit =>
> > >> 1.8146888090247015
> > >> note =>
> > >> 1.7977163272138388
> > >> dlrs =>
> > >> 1.6164390808155846
> > >> avg =>
> > >> 1.3901765773336587
> > >> shrs =>
> > >> 1.3856326531419314
> > >> mths =>
> > >> 1.3168717272038506
> > >> 4th =>
> > >> 1.2161158425617289
> > >> oper =>
> > >> 1.182419473776814
> > >> year =>
> > >> 1.178086061733047
> > >> nine =>
> > >> 1.0670554836445316
> > >> 3rd =>
> > >> 1.041334410056592
> > >> inc =>
> > >> 1.0019361981554935
> > >> Weight : [props - optional]: Point:
> > >>
> > >>
> > >> Inter-Cluster Density: 0.45562152681859414
> > >> Intra-Cluster Density: 0.6952712632167628
> > >> CDbw Inter-Cluster Density: 0.0
> > >> CDbw Intra-Cluster Density: 16.486930227598684
> > >> CDbw Separation: 194.49005884464628
> > >>
> > >> *fuzzy k-means:*
> > >> :SV-18539{n=1039 c=[0:0.026, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> > >> 0.01:0.005, 0.02:0.002, 0.0
> > >> Top Terms:
> > >> said =>
> > >> 1.8665592354713065
> > >> its =>
> > >> 1.1335212213411592
> > >> pct =>
> > >> 1.0862816801353348
> > >> dlrs =>
> > >> 1.0854998884993752
> > >> mln =>
> > >> 1.043163996400643
> > >> from =>
> > >> 0.9684961110525736
> > >> has =>
> > >> 0.912161511978058
> > >> company =>
> > >> 0.8754186972808333
> > >> mar =>
> > >> 0.8675333452422878
> > >> inc =>
> > >> 0.7678617590362815
> > >> would =>
> > >> 0.7610968883652675
> > >> he =>
> > >> 0.7459988770503974
> > >> which =>
> > >> 0.7435613119406804
> > >> year =>
> > >> 0.7302840632748394
> > >> u.s =>
> > >> 0.7281061062439116
> > >> shares =>
> > >> 0.7260764102983083
> > >> corp =>
> > >> 0.7179807367808658
> > >> new =>
> > >> 0.7044203783157115
> > >> stock =>
> > >> 0.6962010978721442
> > >> have =>
> > >> 0.6464265467298506
> > >> :SV-9431{n=1034 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> > >> 0.01:0.004, 0.02:0.002, 0.02
> > >> Top Terms:
> > >> said =>
> > >> 1.864911184196927
> > >> dlrs =>
> > >> 1.199286689822081
> > >> mln =>
> > >> 1.1802134783562215
> > >> pct =>
> > >> 1.1529704214798124
> > >> its =>
> > >> 1.1184398851519701
> > >> from =>
> > >> 1.016647848050332
> > >> company =>
> > >> 0.894703604722841
> > >> mar =>
> > >> 0.879986159541356
> > >> has =>
> > >> 0.8642799128491316
> > >> year =>
> > >> 0.8271823503717782
> > >> inc =>
> > >> 0.7871293745341424
> > >> corp =>
> > >> 0.737705498468879
> > >> which =>
> > >> 0.722975201852743
> > >> would =>
> > >> 0.708000816484415
> > >> u.s =>
> > >> 0.7073294276173905
> > >> billion =>
> > >> 0.7055723996916351
> > >> he =>
> > >> 0.7042684217823294
> > >> new =>
> > >> 0.6834737905434939
> > >> shares =>
> > >> 0.6753327384172428
> > >> stock =>
> > >> 0.6576225144041699
> > >> :SV-4785{n=1044 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> > >> 0.01:0.006, 0.02:0.002, 0.02
> > >> Top Terms:
> > >> said =>
> > >> 1.8796076179735086
> > >> its =>
> > >> 1.172025965452378
> > >> dlrs =>
> > >> 1.130422792460914
> > >> pct =>
> > >> 1.082038255241358
> > >> mln =>
> > >> 1.0772146872767114
> > >> company =>
> > >> 0.9662235879639138
> > >> from =>
> > >> 0.9473172871605616
> > >> has =>
> > >> 0.9224712965830099
> > >> mar =>
> > >> 0.8769325856924421
> > >> inc =>
> > >> 0.8360245257169788
> > >> shares =>
> > >> 0.8334595641384324
> > >> stock =>
> > >> 0.7704621839612175
> > >> corp =>
> > >> 0.7682400250301806
> > >> which =>
> > >> 0.7389988207856137
> > >> would =>
> > >> 0.7339708917389389
> > >> year =>
> > >> 0.7088414843731325
> > >> new =>
> > >> 0.7038109468655172
> > >> he =>
> > >> 0.6993994455501005
> > >> u.s =>
> > >> 0.6772649147622415
> > >> share =>
> > >> 0.6241804830055171
> > >>
> > >> *lda:*
> > >>
> > >> [snip]
> > >> 21539
> > >> {0.02:0.7071698119320297,0.01:0.09185661419250732,0.055:0.05559716236030052,0:0.04416624482186779,0.046:0.04110903741109505,0.10:0.036664417789754995,0.073:0.009543544335363714,0.006913:0.0050293341354450535,0.007050:0.004031353935388081,0.003:0.0019163134919350053}
> > >> 21540
> > >> {0.04:0.4289840457590362,0.006913:0.3764552842292379,0.03:0.14201700033942147,0.025:0.031492533562460345,0.01:0.0057428959027208,0.0625:0.005563615996288134,0.007050:0.004410121345864598,0.02:0.003315679992182833,0.06:0.0010680133665222197,0.057:2.6908116668663575E-4}
> > >> 21541
> > >> {0:0.6323790020346274,0.02:0.282038438551802,0.06:0.05929917592183956,0.046:0.021922159161082488,0.025:0.003123559949176154,0.1:6.163861804777446E-4,0.057:1.1513607281652563E-4,0.077:7.546641269535658E-5,0.05:6.020069105221075E-5,0.04:5.727546417602751E-5}
> > >> 21542
> > >> {0.06:0.7916202902993545,0.003:0.12474538653424426,0.03:0.0516187501990535,0.073:0.013078206873952192,0.077:0.006874558859394474,0.1:0.004726257215175674,0.07:0.0029954110394770084,0.02:0.0015535221634909381,0.0625:8.057802584933225E-4,0.006913:5.349608183182145E-4}
> > >> 21543
> > >> {0.03:0.8557703918728101,0.007050:0.0884696514495358,0.01:0.02386217742025169,0.055:0.01794750983209314,0.046:0.007244240662980594,0.02:0.0025021610305199344,0.1:0.0011795344808501283,0.0625:9.555718731181631E-4,0.077:8.384080940465597E-4,0.003:2.4480111447900804E-4}
> > >> 21544
> > >> {0.006913:0.6497351267772595,0.03:0.2586364130498913,0.003:0.0845308091084703,0.05:0.003488921831506818,0.0625:0.0011991059579690732,0.06:5.301659535652308E-4,0.055:3.152013670552699E-4,0.025:3.11949418681945E-4,0.02:2.2346509541410544E-4,0.057:1.727741085612565E-4}
> > >> 21545
> > >> {0.07:0.24937001680778675,0.03:0.22854680579732564,0.1:0.22068070811382484,0.05:0.21891358916881656,0.06:0.04971205462818302,0.046:0.029384231252419384,0.025:0.0014118858333346275,0.0625:0.0012321257282372393,0.073:3.1606402052550454E-4,0.04:1.1662985389192366E-4}
> > >> 21546
> > >> {0.006913:0.652435612545713,0.073:0.24282539253890825,0.003:0.055020663101050894,0.046:0.01978650831133704,0.04:0.011699600233998459,0.02:0.006822518565048025,0.05:0.0042733514779345234,0.025:0.0020768247329981497,0.03:0.0015466188524926666,0:0.0010433788491090747}
> > >> 21547
> > >> {0.1:0.935072008706917,0.077:0.040616382672055494,0.046:0.023138054104883225,0.06:3.42252302356326E-4,0.03:2.672247473681925E-4,0.007050:9.004419280457053E-5,0.006913:7.844271097106069E-5,0.04:7.198835322717653E-5,0.073:5.8017696474244885E-5,0.02:4.474316852571109E-5}
> > >> 21548
> > >> {0.006913:0.4510141547325999,0.03:0.21472666934984877,0.07:0.15060570743672352,0.046:0.08216181853028293,0.05:0.07498297963542139,0.077:0.01401215532342401,0.04:0.006135722806477439,0.073:0.0031124469556872442,0.02:0.001327252389650958,0.025:4.5167404646311124E-4}
> > >> 21549
> > >> {0.077:0.5249260290096315,0:0.27520186965742544,0.073:0.08959570186504386,0.057:0.05063809804337512,0.02:0.045313417578343,0.03:0.00856024297303885,0.06:0.0034039136814002697,0.07:8.611323331226122E-4,0.05:3.9387255234958607E-4,0.04:3.454752498776842E-4}
> > >> 21550
> > >> {0.077:0.8319708524327014,0.007050:0.16001430652201168,0.0625:0.0024884528530674226,0.03:0.0024228964094551416,0.006913:0.0010048446080994332,0.025:5.974607168723063E-4,0.06:2.7741593377792194E-4,0.057:2.2975316663437597E-4,0.073:2.1769059479546598E-4,0.01:1.4945234676270913E-4}
> > >> 21551
> > >> {0:0.43298549949426596,0.07:0.30407459462158964,0.0625:0.15744077685124136,0.01:0.06385977925647884,0.025:0.020205557109867888,0.04:0.013226123091680062,0.055:0.006501562964287696,0.073:5.871897881404074E-4,0.003:4.554836644848264E-4,0.05:2.0489216962221778E-4}
> > >> 21552
> > >> {0.06:0.7383744333942458,0.02:0.07321126682102753,0.07:0.05910378841288311,0.10:0.056907223730939045,0:0.02739559786902668,0.055:0.02228913751272657,0.1:0.00943274247398869,0.073:0.007301445750018608,0.03:0.0027711985062277246,0.046:0.0022569760697531112}
> > >> 21553
> > >> {0.006913:0.9383779606743132,0.07:0.018356074893823605,0.1:0.017572502072403694,0.025:0.011031848455352145,0.02:0.007110054905474798,0.10:0.0032390933761145377,0.01:0.0022676909091530165,0.06:6.35712654913703E-4,0.003:3.037608224814005E-4,0.0625:2.569989005512836E-4}
> > >> 21554
> > >> {0.02:0.4740260172915081,0.03:0.3264871353578654,0.003:0.12383071192450323,0:0.0387913487693229,0.046:0.013922363892934853,0.01:0.01115832101415319,0.055:0.004438529958216764,0.006913:0.0031112721201723413,0.025:0.002399946628810436,0.073:6.131643125523258E-4}
> > >> 21555
> > >> {0.01:0.42396888624053325,0.057:0.2876207419392007,0.055:0.13436747721404915,0.10:0.0923627978966975,0.05:0.030530372962547347,0.003:0.008404234144369286,0.006913:0.007605013189108045,0.007050:0.005108825532219537,0.0625:0.004651573366090424,0.025:0.002180181204608079}
> > >> 21556
> > >> {0.02:0.9634685201058687,0.077:0.02816791882994464,0.003:0.007368001305747207,0.01:4.6299664334060455E-4,0.055:6.358811991095171E-5,0.06:5.5240076581247115E-5,0.03:5.231350505975146E-5,0.073:4.4773553407989E-5,0:3.6810457882911134E-5,0.006913:3.469265861451538E-5}
> > >> 21557
> > >> {0.06:0.9993947051352264,0.05:2.53296407549323E-4,0.03:9.647254237540585E-5,0.04:4.3439270433017595E-5,0.006913:3.354681491698619E-5,0.046:2.8586727566416525E-5,0.007050:2.5584606074089293E-5,0.02:2.2477243875316502E-5,0:1.9194783598477564E-5,0.073:1.6900267481892075E-5}
> > >> 21558
> > >> {0.06:0.9985073788555696,0.03:8.033091100049725E-4,0.02:1.1953766739610202E-4,0.1:9.257610652171745E-5,0.006913:6.903783269939522E-5,0.04:5.7960967280609926E-5,0.07:5.5767050956214925E-5,0.046:4.857027337508731E-5,0.007050:3.5964741541364354E-5,0.057:3.538185773175377E-5}
> > >> 21559
> > >> {0.006913:0.5411158302162348,0.073:0.10595054605908563,0.04:0.10013413310674449,0.007050:0.08891690362990352,0.003:0.08028744789933502,0.03:0.07779866329563544,0.025:0.0024314950846438975,0.0625:0.0018111845757907532,0.077:5.019763638644379E-4,0.055:2.6705221435486376E-4}
> > >> 21560
> > >> {0.06:0.9978884964462115,0.03:9.139179655096763E-4,0.02:6.066525825847198E-4,0.04:3.2209695809936266E-4,0.006913:7.970816960964983E-5,0.007050:5.58984012266038E-5,0.046:1.545206468939752E-5,0.077:1.5319891034946036E-5,0:1.5223193391279898E-5,0.073:1.1687759018690983E-5}
> > >> 21561
> > >> {0.06:0.8704787441462888,0.007050:0.09131090501970876,0.0625:0.03778566273654969,0.046:8.111987924033587E-5,0.01:6.397000919080148E-5,0.1:4.979138875277178E-5,0.073:2.7942853854174412E-5,0.03:2.762357950066146E-5,0.077:2.6820742114533983E-5,0.025:2.6208487691114472E-5}
> > >> 21562
> > >> {0.06:0.9625603310705717,0.02:0.01708933716171641,0.007050:0.011374975845817934,0.07:0.00482114999912725,0.057:0.003310531318631415,0.077:2.568457462720504E-4,0.025:1.3439876933758153E-4,0.1:1.0512433283405881E-4,0:8.737443941062146E-5,0.046:5.9284232059258864E-5}
> > >> 21563
> > >> {0.06:0.9996809273982157,0.046:1.6325482120709976E-4,0.02:2.1084598024339765E-5,0.006913:1.7901242019979392E-5,0.04:1.3991162886383784E-5,0.03:1.3682157013015017E-5,0.077:1.3602492456590212E-5,0.007050:1.206712606141859E-5,0.1:1.0231842284246997E-5,0.057:7.704725064510759E-6}
> > >> 21564
> > >> {0.06:0.9997861963976675,0.03:2.7140899197995276E-5,0.077:2.4456158311423386E-5,0.04:1.7952818233846462E-5,0.02:1.6334455815684533E-5,0.006913:1.4910843270220926E-5,0.073:1.4893553937733922E-5,0.1:1.2545291899719683E-5,0.007050:1.2337617858874285E-5,0.07:1.121302251254011E-5}
> > >> 21565
> > >> {0.006913:0.5047832315246878,0.007050:0.2502505818382197,0.04:0.09937533960784072,0.03:0.05332716291468396,0.0625:0.035738976624857435,0.05:0.023139962103851885,0.1:0.01510786357969295,0.025:0.01294816540331917,0.06:0.003589347327961106,0.073:4.8607269023994543E-4}
> > >> 21566
> > >> {0.03:0.865579490292393,0.073:0.065013560785593,0.077:0.056622094108767465,0.046:0.006811842330071251,0.057:0.0021561477140846267,0.01:0.0013149375957061502,0.04:7.456782721333958E-4,0.05:6.412995789267404E-4,0:2.107021879325011E-4,0.007050:1.8121393989190674E-4}
> > >> 21567
> > >> {0.077:0.6528663315309344,0.03:0.2794089480653573,0.025:0.060214953606503134,0.003:0.0029851203917978303,0.01:0.002840784719750811,0.007050:5.508511345707982E-4,0.02:3.3395642786457786E-4,0.07:1.7452781529689483E-4,0.055:1.541344869853217E-4,0.046:1.0154945247629696E-4}
> > >> 21568
> > >> {0.057:0.7925855379581803,0.03:0.2036907633660934,0.06:0.002283924010657722,0.046:0.0011125177332923534,0.10:6.706442563331911E-5,0.02:3.9070227131596934E-5,0.07:3.266094677087569E-5,0.1:2.3436639268605713E-5,0.077:2.318067689954084E-5,0.006913:2.273173553155518E-5}
> > >> 21569
> > >> {0.06:0.7223525910216753,0.1:0.22280373045161775,0.04:0.04239924319412595,0.02:0.006529556700876843,0.007050:0.004661124794787862,0.10:2.017974034648702E-4,0.05:2.017516118028694E-4,0.025:1.583677755896652E-4,0.006913:1.1939663934259253E-4,0:8.736457986006156E-5}
> > >> 21570
> > >> {0.073:0.5321098304788365,0.006913:0.3897385574581158,0.02:0.051125703897749404,0.077:0.012417023254098358,0.01:0.006491518762040415,0.03:0.005434610908750246,0.055:9.352347087701305E-4,0.06:5.239363525283659E-4,0.10:4.4114444135088393E-4,0.003:2.2360302221231853E-4}
> > >> 21571
> > >> {0.06:0.9074255414695478,0.05:0.08971808812931319,0.02:0.0019124104766371694,0.1:2.581234320438502E-4,0.073:1.9041498001195312E-4,0.046:8.348355382104383E-5,0.006913:5.748062878632201E-5,0.057:4.8979196235823963E-5,0.04:4.4005272512340306E-5,0.03:4.1089625609562384E-5}
> > >> 21572
> > >> {0.06:0.7216888928389846,0.04:0.1955684645266304,0.006913:0.08235454447065854,0.1:9.82899034505965E-5,0.046:4.9184577303445956E-5,0.05:4.021965070167039E-5,0.007050:3.081280652634891E-5,0.073:2.791145099471127E-5,0.02:1.95681856192452E-5,0:1.9037190007395713E-5}
> > >> 21573
> > >> {0.05:0.8568971411565196,0.046:0.12909436895238377,0.06:0.012704234652048044,0.04:2.472784666357729E-4,0.055:2.1886698996361582E-4,0.1:1.1974451444757112E-4,0.0625:9.082357988309755E-5,0.07:9.030017229129562E-5,0.03:7.268045692763623E-5,0.073:6.607926928741721E-5}
> > >> 21574
> > >> {0.046:0.5619466458628039,0.006913:0.184782367089353,0.0625:0.09726566772972363,0.003:0.09534816862353344,0.02:0.015692927163565275,0.073:0.015492017672231727,0.01:0.01411218625979968,0.007050:0.01037341031640615,0.055:0.0020124341216292752,0:0.0013828685922332715}
> > >> 21575
> > >> {0.05:0.3167681189235041,0.06:0.2879333280436204,0.046:0.22584628506521745,0.003:0.15994241628395953,0.0625:0.006482994028630967,0.1:0.0016773104050919493,0.055:4.8725015996676173E-4,0.03:2.432893903382962E-4,0.025:1.6015914759364425E-4,0.006913:1.0021292427951807E-4}
> > >> 21576
> > >> {0.077:0.2407816981967022,0.003:0.18594668103110193,0.10:0.14951775492012523,0.0625:0.09241152906714677,0.007050:0.08997645163280943,0.057:0.07102865286733068,0.055:0.048046579920457584,0.05:0.03776387140040494,0.073:0.033355081179026046,0.006913:0.015980584385115525}
> > >> 21577
> > >> {0.06:0.9599074612361259,0.1:0.03694135499501186,0.04:0.0021941315048273186,0.057:4.7070027358666304E-4,0.02:1.8231606308803002E-4,0.0625:6.234942557920162E-5,0.006913:5.159011604129561E-5,0.046:3.9832812943124216E-5,0.007050:3.0380683715134534E-5,0.05:2.8920851352755496E-5}
> > >>
> > >>
> > >> *Streaming k-means:*
> > >>
> > >> [snip]
> > >> INFO: Number of Centroids: 0
> > >> Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
> > >> WARNING: job_local23982482_0001
> > >> java.lang.IllegalArgumentException: Must have nonzero number of training
> > >> and test vectors. Asked for %.1f %% of %d vectors for test
> > >> [10.000000149011612, 0]
> > >> at
> > >> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
> > >> at
> > >> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
> > >> at
> > >> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
> > >> at
> > >> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
> > >> at
> > >> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
> > >> at
> > >> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
> > >> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
> > >> at
> > >> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
> > >> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
> > >> at
> > >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
> > >>
> > >> [snip]
> > >>
> > >> WARNING: No qualcluster.props found on classpath, will use command-line
> > >> arguments only
> > >> Num clusters: 0; maxDistance: 0.000000
> > >> [Dunn Index] First: Infinity
> > >> [Davies-Bouldin Index] First: NaN
> > >> Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
> > >> INFO: Program took 535 ms (Minutes: 0.008916666666666666)
> > >> cluster,distance.mean,distance.sd
> > >> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > >>
> > >>
> > >> On Tue, Jan 21, 2014 at 1:47 PM, Andrew Musselman <
> > >> andrew.musselman@gmail.com> wrote:
> > >>
> > >>> *classify-20newsgroups.sh*
> > >>>
> > >>> *Complementary naive bayes:*
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances : 11207 98.9406%
> > >>> Incorrectly Classified Instances : 120 1.0594%
> > >>> Total Classified Instances : 11327
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a b c d e f g h i
> > >>> j k l m n o p q r s
> > >>> t <--Classified as
> > >>> 475 0 0 1 0 0 0 0 0
> > >>> 0 0 0 0 0 1 0 1 0 0
> > >>> 0 | 478 a = alt.atheism
> > >>> 0 597 1 1 0 1 1 0 0
> > >>> 0 0 1 0 2 1 0 0 0 0
> > >>> 0 | 605 b = comp.graphics
> > >>> 0 1 620 3 0 1 0 0 0
> > >>> 0 0 1 0 0 1 0 0 0 0
> > >>> 0 | 627 c = comp.os.ms-windows.misc
> > >>> 1 1 1 593 2 0 0 0 0
> > >>> 0 0 0 0 0 0 1 0 0 0
> > >>> 0 | 599 d = comp.sys.ibm.pc.hardware
> > >>> 0 1 1 0 568 0 1 0 0
> > >>> 0 1 1 2 0 0 0 0 1 0
> > >>> 0 | 576 e = comp.sys.mac.hardware
> > >>> 0 4 2 0 0 581 0 0 0
> > >>> 0 0 0 0 0 0 0 0 0 0
> > >>> 0 | 587 f = comp.windows.x
> > >>> 0 0 0 1 2 0 571 3 0
> > >>> 0 1 1 4 1 0 0 0 0 0
> > >>> 0 | 584 g = misc.forsale
> > >>> 0 0 0 1 0 0 0 589 1
> > >>> 0 0 1 1 0 0 0 0 0 0
> > >>> 0 | 593 h = rec.autos
> > >>> 0 0 0 0 0 0 0 1 565
> > >>> 0 0 0 0 0 1 0 0 0 0
> > >>> 0 | 567 i = rec.motorcycles
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 600 2 0 0 0 1 0 0 0 0
> > >>> 0 | 603 j = rec.sport.baseball
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 1 584 0 0 0 0 0 0 0 0
> > >>> 0 | 585 k = rec.sport.hockey
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 579 0 0 0 0 0 1 0
> > >>> 0 | 580 l = sci.crypt
> > >>> 0 0 0 1 3 0 2 0 0
> > >>> 2 0 0 567 1 2 1 0 0 0
> > >>> 0 | 579 m = sci.electronics
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 0 1 605 0 0 0 0 0
> > >>> 0 | 606 n = sci.med
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 0 0 0 602 0 0 0 0
> > >>> 0 | 602 o = sci.space
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 0 0 1 0 602 0 0 1
> > >>> 0 | 604 p = soc.religion.christian
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 0 0 0 0 0 556 0 0
> > >>> 0 | 556 q = talk.politics.mideast
> > >>> 0 0 1 0 0 0 0 0 0
> > >>> 0 0 1 0 0 1 0 0 568 0
> > >>> 0 | 571 r = talk.politics.guns
> > >>> 11 0 0 0 0 0 0 0 0
> > >>> 1 0 0 0 1 3 8 1 4 338
> > >>> 2 | 369 s = talk.religion.misc
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 1 0 0 0 1 0 3 4 0
> > >>> 447 | 456 t = talk.politics.misc
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa 0.9806
> > >>> Accuracy 98.9406%
> > >>> Reliability 94.0932%
> > >>> Reliability (standard deviation) 0.2163
> > >>>
> > >>> Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 15870 ms (Minutes: 0.2645)
> > >>> + echo 'Testing on holdout set'
> > >>> Testing on holdout set
> > >>> + ./bin/mahout testnb -i /tmp/mahout-work-ec2-user/20news-test-vectors
> > >>> -m /tmp/mahout-work-ec2-user/model -l /tmp/mahout-work-ec2-user/labelindex
> > >>> -ow -o /tmp/mahout-work-ec2-user/20news-testing -c
> > >>>
> > >>> [snip]
> > >>>
> > >>> INFO: Complementary Results:
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances : 6715 89.3071%
> > >>> Incorrectly Classified Instances : 804 10.6929%
> > >>> Total Classified Instances : 7519
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a b c d e f g h i
> > >>> j k l m n o p q r s
> > >>> t <--Classified as
> > >>> 298 0 0 0 0 0 0 0 0
> > >>> 1 0 0 0 1 2 5 1 0 13
> > >>> 0 | 321 a = alt.atheism
> > >>> 0 298 11 6 1 12 2 2 1
> > >>> 1 3 8 3 4 2 4 1 4 4
> > >>> 1 | 368 b = comp.graphics
> > >>> 1 17 286 16 4 9 6 3 2
> > >>> 0 1 0 1 7 1 0 2 1 0
> > >>> 1 | 358 c = comp.os.ms-windows.misc
> > >>> 2 6 11 309 9 5 14 8 1
> > >>> 0 2 0 6 4 2 0 1 2 1
> > >>> 0 | 383 d = comp.sys.ibm.pc.hardware
> > >>> 0 10 8 7 334 7 5 5 2
> > >>> 0 3 0 2 1 1 0 1 1 0
> > >>> 0 | 387 e = comp.sys.mac.hardware
> > >>> 1 13 7 8 2 355 2 0 2
> > >>> 0 0 5 1 1 3 0 0 1 0
> > >>> 0 | 401 f = comp.windows.x
> > >>> 0 7 11 29 12 9 268 16 8
> > >>> 4 3 2 6 4 2 1 3 1 2
> > >>> 3 | 391 g = misc.forsale
> > >>> 0 1 0 0 3 0 7 362 8
> > >>> 2 2 1 2 0 2 0 1 2 0
> > >>> 4 | 397 h = rec.autos
> > >>> 0 0 0 1 0 0 1 0 423
> > >>> 0 0 0 2 1 0 1 0 0 0
> > >>> 0 | 429 i = rec.motorcycles
> > >>> 0 0 1 0 0 0 0 2 2
> > >>> 371 8 0 2 3 0 2 0 0 0
> > >>> 0 | 391 j = rec.sport.baseball
> > >>> 0 0 1 0 0 0 1 0 0
> > >>> 2 409 0 0 0 0 0 0 0 0
> > >>> 1 | 414 k = rec.sport.hockey
> > >>> 0 0 1 2 1 0 1 0 0
> > >>> 0 0 404 0 0 0 0 0 1 0
> > >>> 1 | 411 l = sci.crypt
> > >>> 0 5 4 11 1 3 7 9 2
> > >>> 5 3 3 339 2 6 0 1 1 2
> > >>> 1 | 405 m = sci.electronics
> > >>> 0 4 0 1 0 0 0 1 0
> > >>> 1 1 0 3 367 3 1 2 0 0
> > >>> 0 | 384 n = sci.med
> > >>> 0 1 2 0 1 0 2 0 0
> > >>> 1 0 0 1 1 375 0 1 0 0
> > >>> 0 | 385 o = sci.space
> > >>> 4 2 1 1 0 0 1 1 2
> > >>> 0 0 1 1 5 1 367 4 0 1
> > >>> 1 | 393 p = soc.religion.christian
> > >>> 0 1 0 0 0 0 0 0 0
> > >>> 2 0 0 0 0 0 2 378 0 1
> > >>> 0 | 384 q = talk.politics.mideast
> > >>> 0 0 0 0 0 2 1 1 1
> > >>> 1 0 3 0 3 0 0 2 319 2
> > >>> 4 | 339 r = talk.politics.guns
> > >>> 32 0 0 1 0 0 0 0 0
> > >>> 1 1 1 0 2 2 26 5 7 175
> > >>> 6 | 259 s = talk.religion.misc
> > >>> 0 0 0 2 0 0 0 0 0
> > >>> 1 2 2 0 1 2 1 10 18 2
> > >>> 278 | 319 t = talk.politics.misc
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa 0.8594
> > >>> Accuracy 89.3071%
> > >>> Reliability 84.611%
> > >>> Reliability (standard deviation) 0.2148
> > >>>
> > >>> Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 10840 ms (Minutes: 0.18066666666666667)
> > >>>
> > >>>
> > >>> *Naive bayes:*
> > >>> INFO: Standard NB Results:
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances : 11286 99.0869%
> > >>> Incorrectly Classified Instances : 104 0.9131%
> > >>> Total Classified Instances : 11390
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a b c d e f g h i
> > >>> j k l m n o p q r s
> > >>> t <--Classified as
> > >>> 474 0 0 0 0 0 0 0 0
> > >>> 0 0 0 0 0 0 0 0 0 2
> > >>> 1 | 477 a = alt.atheism
> > >>> 0 566 0 2 0 1 0 0 0
> > >>> 0 0 0 0 0 0 0 0 0 0
> > >>> 0 | 569 b = comp.graphics
> > >>> 0 10 590 29 2 4 1 0 0
> > >>> 0 0 0 1 0 0 0 0 0 0
> > >>> 1 | 638 c = comp.os.ms-windows.misc
> > >>> 0 0 0 596 0 0 0 0 0
> > >>> 0 0 0 0 0 0 0 0 0 0
> > >>> 0 | 596 d = comp.sys.ibm.pc.hardware
> > >>> 0 0 0 0 575 0 1 0 0
> > >>> 0 0 0 1 0 0 0 0 0 0
> > >>> 0 | 577 e = comp.sys.mac.hardware
> > >>> 0 2 2 2 0 593 1 0 0
> > >>> 0 0 0 0 0 1 0 0 0 0
> > >>> 0 | 601 f = comp.windows.x
> > >>> 0 0 0 1 0 0 589 1 0
> > >>> 0 1 0 2 0 0 0 0 0 0
> > >>> 0 | 594 g = misc.forsale
> > >>> 0 0 0 0 0 0 0 594 0
> > >>> 0 0 0 0 0 0 0 0 0 0
> > >>> 0 | 594 h = rec.autos
> > >>> 0 0 0 0 0 0 0 0 611
> > >>> 0 0 0 0 0 0 0 0 0 0
> > >>> 0 | 611 i = rec.motorcycles
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 616 1 0 0 0 0 0 0 0 0
> > >>> 0 | 617 j = rec.sport.baseball
> > >>> 0 0 0 0 0 0 1 0 0
> > >>> 0 620 0 0 0 0 0 0 0 0
> > >>> 0 | 621 k = rec.sport.hockey
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 580 0 0 0 0 0 1 0
> > >>> 0 | 581 l = sci.crypt
> > >>> 0 0 0 3 1 0 0 0 0
> > >>> 0 0 0 571 0 0 0 0 0 0
> > >>> 0 | 575 m = sci.electronics
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 0 2 583 0 0 0 0 0
> > >>> 0 | 585 n = sci.med
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 0 0 1 599 0 0 0 0
> > >>> 0 | 600 o = sci.space
> > >>> 0 1 0 0 0 0 0 0 0
> > >>> 0 0 0 0 0 0 615 0 0 0
> > >>> 0 | 616 p = soc.religion.christian
> > >>> 1 0 0 0 0 0 0 0 0
> > >>> 0 0 0 0 0 0 1 560 0 0
> > >>> 0 | 562 q = talk.politics.mideast
> > >>> 0 0 1 0 0 0 0 0 0
> > >>> 0 0 1 0 0 0 0 0 548 0
> > >>> 1 | 551 r = talk.politics.guns
> > >>> 10 0 0 0 0 0 0 0 0
> > >>> 0 0 0 0 0 1 1 0 2 344
> > >>> 1 | 359 s = talk.religion.misc
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 1 1 0 0 0 0 2 0
> > >>> 462 | 466 t = talk.politics.misc
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa 0.9847
> > >>> Accuracy 99.0869%
> > >>> Reliability 94.3334%
> > >>> Reliability (standard deviation) 0.2169
> > >>>
> > >>> Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 14304 ms (Minutes: 0.2384)
> > >>> + echo 'Testing on holdout set'
> > >>> Testing on holdout set
> > >>>
> > >>> [snip]
> > >>>
> > >>> INFO: Standard NB Results:
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances : 6718 90.1019%
> > >>> Incorrectly Classified Instances : 738 9.8981%
> > >>> Total Classified Instances : 7456
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a b c d e f g h i
> > >>> j k l m n o p q r s
> > >>> t <--Classified as
> > >>> 294 0 0 0 0 0 0 0 0
> > >>> 0 0 2 0 1 1 6 1 1 16
> > >>> 0 | 322 a = alt.atheism
> > >>> 0 345 6 14 6 11 6 0 0
> > >>> 0 0 5 7 1 3 0 0 0 0
> > >>> 0 | 404 b = comp.graphics
> > >>> 2 29 177 78 22 19 9 1 0
> > >>> 0 0 4 2 0 1 1 0 0 1
> > >>> 1 | 347 c = comp.os.ms-windows.misc
> > >>> 1 9 2 335 18 2 10 0 0
> > >>> 0 1 0 8 0 0 0 0 0 0
> > >>> 0 | 386 d = comp.sys.ibm.pc.hardware
> > >>> 1 4 2 13 347 3 5 1 0
> > >>> 0 1 0 7 1 0 0 0 1 0
> > >>> 0 | 386 e = comp.sys.mac.hardware
> > >>> 0 20 0 4 0 352 4 0 0
> > >>> 0 0 0 1 1 3 0 1 0 1
> > >>> 0 | 387 f = comp.windows.x
> > >>> 0 2 0 21 5 1 323 7 2
> > >>> 2 0 2 12 0 3 0 0 0 0
> > >>> 1 | 381 g = misc.forsale
> > >>> 0 1 0 0 1 0 15 363 8
> > >>> 1 0 0 4 1 0 0 0 1 0
> > >>> 1 | 396 h = rec.autos
> > >>> 0 1 0 0 0 0 6 6 370
> > >>> 0 0 0 0 1 0 0 0 0 1
> > >>> 0 | 385 i = rec.motorcycles
> > >>> 1 0 0 1 1 0 2 1 2
> > >>> 362 5 0 2 0 0 0 0 0 0
> > >>> 0 | 377 j = rec.sport.baseball
> > >>> 0 0 0 1 2 0 0 0 0
> > >>> 3 371 0 0 0 0 0 0 0 0
> > >>> 1 | 378 k = rec.sport.hockey
> > >>> 0 3 1 0 1 0 2 0 0
> > >>> 0 0 396 0 1 0 0 1 1 1
> > >>> 3 | 410 l = sci.crypt
> > >>> 0 7 0 7 7 2 6 4 0
> > >>> 0 0 1 369 2 2 0 0 0 0
> > >>> 2 | 409 m = sci.electronics
> > >>> 0 3 0 2 1 0 2 0 0
> > >>> 0 0 1 4 383 4 0 0 1 0
> > >>> 4 | 405 n = sci.med
> > >>> 0 5 0 0 1 0 3 0 0
> > >>> 0 0 0 1 0 374 1 0 0 1
> > >>> 1 | 387 o = sci.space
> > >>> 6 2 0 1 1 0 0 1 0
> > >>> 1 0 0 1 5 0 352 2 1 7
> > >>> 1 | 381 p = soc.religion.christian
> > >>> 1 1 0 0 0 0 0 0 0
> > >>> 0 1 0 0 0 0 0 373 1 0
> > >>> 1 | 378 q = talk.politics.mideast
> > >>> 0 0 0 0 0 0 1 0 1
> > >>> 0 0 2 0 0 0 0 0 346 2
> > >>> 7 | 359 r = talk.politics.guns
> > >>> 26 1 0 1 0 0 0 2 0
> > >>> 1 1 0 0 1 1 20 2 6 200
> > >>> 7 | 269 s = talk.religion.misc
> > >>> 1 0 0 0 0 0 0 2 0
> > >>> 0 1 0 0 2 2 0 1 14 0
> > >>> 286 | 309 t = talk.politics.misc
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa 0.8726
> > >>> Accuracy 90.1019%
> > >>> Reliability 85.4491%
> > >>> Reliability (standard deviation) 0.2222
> > >>>
> > >>> Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 10878 ms (Minutes: 0.1813)
> > >>>
> > >>> *SGD:*
> > >>> 7532 test files
> > >>>
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances : 5649 75%
> > >>> Incorrectly Classified Instances : 1883 25%
> > >>> Total Classified Instances : 7532
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a b c d e f g h i
> > >>> j k l m n o p q r s
> > >>> t <--Classified as
> > >>> 186 6 3 10 5 0 33 4 13
> > >>> 15 7 1 24 15 3 15 5 5 29
> > >>> 15 | 394 a = sci.space
> > >>> 5 309 0 3 2 5 0 0 0
> > >>> 1 9 21 2 0 0 18 4 4 1
> > >>> 1 | 385 b = comp.sys.mac.hardware
> > >>> 4 1 101 3 0 1 63 0 7
> > >>> 0 1 1 5 16 3 0 3 7 1
> > >>> 34 | 251 c = talk.religion.misc
> > >>> 11 12 1 265 1 10 3 0 0
> > >>> 17 10 11 5 2 0 11 3 6 21
> > >>> 0 | 389 d = comp.graphics
> > >>> 2 1 1 0 349 2 3 0 3
> > >>> 2 6 1 5 1 0 2 15 2 1
> > >>> 2 | 398 e = rec.motorcycles
> > >>> 7 20 3 19 2 254 6 0 2
> > >>> 11 2 39 7 2 0 4 2 2 9
> > >>> 3 | 394 f = comp.os.ms-windows.misc
> > >>> 2 1 13 0 0 0 247 0 1
> > >>> 1 3 0 6 2 4 0 2 3 5
> > >>> 29 | 319 g = alt.atheism
> > >>> 1 1 0 0 2 0 2 361 0
> > >>> 1 2 0 2 0 0 1 3 22 0
> > >>> 1 | 399 h = rec.sport.hockey
> > >>> 3 0 3 1 0 0 5 0 161
> > >>> 0 1 2 12 102 0 0 1 2 11
> > >>> 6 | 310 i = talk.politics.misc
> > >>> 2 8 0 19 0 19 0 0 1
> > >>> 294 10 11 4 2 0 5 0 3 11
> > >>> 6 | 395 j = comp.windows.x
> > >>> 2 10 0 1 1 0 0 0 0
> > >>> 1 347 13 2 1 0 5 3 2 2
> > >>> 0 | 390 k = misc.forsale
> > >>> 1 36 0 6 1 25 0 0 1
> > >>> 6 10 257 2 1 0 34 6 0 6
> > >>> 0 | 392 l = comp.sys.ibm.pc.hardware
> > >>> 2 2 2 2 1 0 12 0 0
> > >>> 6 10 4 312 5 2 13 11 3 3
> > >>> 6 | 396 m = sci.med
> > >>> 2 0 3 2 1 0 0 1 13
> > >>> 0 5 1 2 314 2 0 2 2 10
> > >>> 4 | 364 n = talk.politics.guns
> > >>> 1 0 2 1 1 0 34 1 33
> > >>> 1 3 0 1 8 271 1 4 5 6
> > >>> 3 | 376 o = talk.politics.mideast
> > >>> 3 14 0 8 2 8 3 1 1
> > >>> 7 12 29 6 2 1 245 13 2 32
> > >>> 4 | 393 p = sci.electronics
> > >>> 3 3 0 2 11 0 1 0 2
> > >>> 1 11 6 4 2 0 11 330 4 4
> > >>> 1 | 396 q = rec.autos
> > >>> 0 0 1 0 1 0 4 12 3
> > >>> 1 3 0 0 0 0 5 6 359 1
> > >>> 1 | 397 r = rec.sport.baseball
> > >>> 0 1 0 0 0 1 0 0 3
> > >>> 3 0 0 3 2 1 6 1 6 366
> > >>> 3 | 396 s = sci.crypt
> > >>> 0 2 11 1 1 0 40 0 1
> > >>> 2 3 4 2 1 0 5 0 2 2
> > >>> 321 | 398 t = soc.religion.christian
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa 0.7073
> > >>> Accuracy 75%
> > >>> Reliability 70.6238%
> > >>> Reliability (standard deviation) 0.2187
> > >>> Log-likelihood mean : -1.1182
> > >>> 25%-ile : -1.6911
> > >>> 75%-ile : -0.0803
> > >>>
> > >>> Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 10783 ms (Minutes: 0.17971666666666666)
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <su...@yahoo.com>wrote:
> > >>>
> > >>>> Thanks Andrew for reporting that. I rolled back the release to fix this
> > >>>> and few other issues.
> > >>>>
> > >>>> We have removed asf-examples*.sh from trunk as the sample file at the
> > >>>> url mentioned in ur email is not available.
> > >>>> This is something we need to fix and restore in 1.0.
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <
> > >>>> ap.dev@outlook.com> wrote:
> > >>>>
> > >>>> from the asf-email-examples.sh script:
> > >>>>
> > >>>> # You will need to download or otherwise obtain some or all of the
> > >>>> Amazon ASF Em
> > >>>> ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566)
> > >>>> to use this
> > >>>> script.
> > >>>> # To obtain a full copy you will need to launch an EC2 instance and
> > >>>> mount the da
> > >>>> taset to download it, otherwise you can get a sample of it at
> > >>>> #
> > >>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
> > >>>>
> > >>>> It looks like the:
> > >>>>
> > >>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
> > >>>>
> > >>>> link is down.
> > >>>>
> > >>>> Is there somewhere else that we can get a subset of the ASF emails?
> > >>>>
> > >>>>
> > >>>>
> > >>>> Date: Tue, 21 Jan 2014 09:48:06 -0800
> > >>>> > Subject: Re: MAHOUT 0.9 Release - New URL
> > >>>> > From: andrew.musselman@gmail.com
> > >>>> > To: dev@mahout.apache.org
> > >>>> >
> > >>>> > Sure thing; continuing to smoke test the other examples tonight
> > >>>> >
> > >>>> >
> > >>>> > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <
> > >>>> suneel_marthi@yahoo.com>wrote:
> > >>>> >
> > >>>> > > Thanks Andrew M., see that some of the example scripts need to be
> > >>>> fixed as
> > >>>> > > they still refer to the deprecated algorithms.
> > >>>> > > See that the Streaming KMeans has failed for you as well.
> > >>>> > >
> > >>>> > > I'll be rolling back the release today to fix these issues.
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
> > >>>> > > andrew.musselman@gmail.com> wrote:
> > >>>> > >
> > >>>> > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default
> > >>>> 64-bit
> > >>>> > > Linux AMI from tarball.
> > >>>> > >
> > >>>> > > All tests pass.
> > >>>> > >
> > >>>> > > *Output of examples:*
> > >>>> > > *asf-email-examples.sh, run on mahout.apache.org
> > >>>> > > <http://mahout.apache.org>:*
> > >>>> > > *recommendations:*
> > >>>> > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
> > >>>> > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000 | less
> > >>>> > > 1
> > >>>> > >
> > >>>> > >
> > >>>> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
> > >>>> > > 4
> > >>>> > >
> > >>>> > >
> > >>>> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
> > >>>> > > 6
> > >>>> > >
> > >>>> > >
> > >>>> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
> > >>>> > > 8
> > >>>> > > [12758:1.0,19409:1.0,11112:1.0]
> > >>>> > > 11
> > >>>> > >
> > >>>> > >
> > >>>> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
> > >>>> > > 14
> > >>>> > >
> > >>>> > >
> > >>>> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
> > >>>> > > 15
> > >>>> > >
> > >>>> > >
> > >>>> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
> > >>>> > > 16
> > >>>> > >
> > >>>> > >
> > >>>> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
> > >>>> > > 18
> > >>>> > >
> > >>>> > >
> > >>>> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
> > >>>> > > 19 [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
> > >>>> > > 20
> > >>>> > >
> > >>>> > >
> > >>>> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
> > >>>> > > [snip]
> > >>>> > >
> > >>>> > > *clustering; kmeans:*
> > >>>> > > [snip]
> > >>>> > > Weight : [props - optional]: Point:
> > >>>> > > 1.0 :
> > >>>> > > [distance-squared=1.0193102046188427]:
> > >>>> > > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus=
> > >>>> > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
> > >>>> 7573:0.204,
> > >>>> > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093,
> > >>>> 9779:0.159,
> > >>>> > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
> > >>>> > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
> > >>>> > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
> > >>>> > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
> > >>>> > > 39789:0.110, 40743:0.190, 45775:0.086]
> > >>>> > > 1.0 : [distance-squared=0.9823018320457279]:
> > >>>> > > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus=
> > >>>> > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
> > >>>> 5336:0.106,
> > >>>> > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173,
> > >>>> 7832:0.072,
> > >>>> > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
> > >>>> > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
> > >>>> > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
> > >>>> > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
> > >>>> > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
> > >>>> > > 1.0 : [distance-squared=0.9509142993214911]:
> > >>>> > >
> > >>>> /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
> > >>>> > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
> > >>>> > > 4419:0.076,
> > >>>> > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056,
> > >>>> 7235:0.048,
> > >>>> > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123,
> > >>>> 7683:0.077,
> > >>>> > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
> > >>>> 10225:0.081,
> > >>>> > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
> > >>>> > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
> > >>>> > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
> > >>>> > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
> > >>>> > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
> > >>>> > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
> > >>>> > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
> > >>>> > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
> > >>>> > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
> > >>>> > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
> > >>>> > > 41280:0.065, 41696:0.072, 41947:0.118,
> > >>>> > > 43685:0.086, 44077:0.308,
> > >>>> > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
> > >>>> > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
> > >>>> > > [snip]
> > >>>> > >
> > >>>> > > *clustering; dirichlet:*
> > >>>> > > Get this complaint:
> > >>>> > > Running Dirichlet with K = 8
> > >>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > >>>> > > HADOOP_CONF_DIR=
> > >>>> > > MAHOUT-JOB:
> > >>>> > >
> > >>>> > >
> > >>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > >>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class:
> > >>>> dirichlet
> > >>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props
> > >>>> found on
> > >>>> > > classpath, will use command-line arguments only
> > >>>> > > Unknown program 'dirichlet' chosen.
> > >>>> > >
> > >>>> > > *clustering: minhash:*
> > >>>> > > Running Minhash
> > >>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > >>>> > > HADOOP_CONF_DIR=
> > >>>> > > MAHOUT-JOB:
> > >>>> > >
> > >>>> > >
> > >>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > >>>> > > 14/01/21 05:17:27 WARN
> > >>>> > > driver.MahoutDriver: Unable to add class: minhash
> > >>>> > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found
> > >>>> on
> > >>>> > > classpath, will use command-line arguments only
> > >>>> > > Unknown program 'minhash' chosen.
> > >>>> > >
> > >>>> > > *classification; standard:*
> > >>>> > > =======================================================
> > >>>> > > Summary
> > >>>> > > -------------------------------------------------------
> > >>>> > > Correctly Classified Instances : 5384 87.7874%
> > >>>> > > Incorrectly Classified Instances : 749 12.2126%
> > >>>> > > Total Classified Instances : 6133
> > >>>> > >
> > >>>> > > =======================================================
> > >>>> > > Confusion Matrix
> > >>>> > > -------------------------------------------------------
> > >>>> > > a b c d
> > >>>> > > <--Classified as
> > >>>> > > 2949 7 531 25 | 3512 a = dev
> > >>>> > > 0 0 0 0 | 0 b = general
> > >>>> > > 99 8 1763 8 | 1878 c = user
> > >>>> > > 41 1 29 672 | 743 d = commits
> > >>>> > >
> > >>>> > > =======================================================
> > >>>> > > Statistics
> > >>>> > > -------------------------------------------------------
> > >>>> > > Kappa
> > >>>> > > 0.7877
> > >>>> > > Accuracy 87.7874%
> > >>>> > > Reliability 53.658%
> > >>>> > > Reliability (standard deviation) 0.4911
> > >>>> > >
> > >>>> > > *classification; complementary:*
> > >>>> > > =======================================================
> > >>>> > > Summary
> > >>>> > > -------------------------------------------------------
> > >>>> > > Correctly Classified Instances : 5530 90.1679%
> > >>>> > > Incorrectly Classified Instances : 603 9.8321%
> > >>>> > > Total Classified Instances :
> > >>>> > > 6133
> > >>>> > >
> > >>>> > > =======================================================
> > >>>> > > Confusion Matrix
> > >>>> > > -------------------------------------------------------
> > >>>> > > a b c d <--Classified as
> > >>>> > > 3168 0 276 68 | 3512 a = dev
> > >>>> > > 0 0 0 0 | 0 b = general
> > >>>> > > 196 0 1652 30 | 1878 c = user
> > >>>> > > 25 0 8 710 | 743 d =
> > >>>> > > commits
> > >>>> > >
> > >>>> > > =======================================================
> > >>>> > > Statistics
> > >>>> > > -------------------------------------------------------
> > >>>> > > Kappa 0.8259
> > >>>> > > Accuracy 90.1679%
> > >>>> > > Reliability 54.7459%
> > >>>> > > Reliability (standard deviation) 0.5005
> > >>>> > >
> > >>>> > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms
> > >>>> (Minutes:
> > >>>> > > 0.34836666666666666)
> > >>>> > >
> > >>>> > > *classification; sgd, with three categories:*
> > >>>> > > Running SGD Training
> > >>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
> > >>>> > > and
> > >>>> > > HADOOP_CONF_DIR=
> > >>>> > > MAHOUT-JOB:
> > >>>> > >
> > >>>> > >
> > >>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > >>>> > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
> > >>>> > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
> > >>>> classpath,
> > >>>> > > will use command-line arguments only
> > >>>> > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
> > >>>> > > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
> > >>>> > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
> > >>>> > > --output=[asf-output/classification/sgd/models], --poolSize=[5],
> > >>>> > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
> > >>>> > > 24168 training files
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 1
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> > >>>> > > 2
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 3
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 4
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 6
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 8
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 10
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00
> > >>>> > > 0.00 0.00 0.0000000 0.0000000 12
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 15
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 20
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 25
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 30
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000
> > >>>> > > 0.0000000 40
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 50
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 60
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 70
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 80
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 100
> > >>>> > > 0.000
> > >>>> > > 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 120
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 140
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 150
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 200
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 250
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00
> > >>>> > > 0.00 0.00 0.0000000 0.0000000 300
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 400
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 500
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 600
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 700
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000
> > >>>> > > 0.0000000 800
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > >>>> > > 1.0019413e-08 1000 -0.607 75.78 none
> > >>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > >>>> > > 1.0019413e-08 1200 -0.607 75.78 none
> > >>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > >>>> > > 1.0019413e-08 1400 -0.607 75.78 none
> > >>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > >>>> > > 1.0019413e-08 1500 -0.607 75.78 none
> > >>>> > > 0.24 43686.00 17924.00 329.50
> > >>>> > > 1.0571799e-08
> > >>>> > > 1.0032261e-08 2000 -0.487 82.65 none
> > >>>> > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
> > >>>> > > 1.0011902e-08 2500 -0.439 83.90 none
> > >>>> > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
> > >>>> > > 1.0011902e-08 3000 -0.439 83.90 none
> > >>>> > > 0.32 50635.00 28531.00 437.09 1.0551175e-08
> > >>>> > > 1.0000001e-08 4000 -0.351 88.14 none
> > >>>> > > 0.32 50635.00 32642.00 437.09 1.0551175e-08
> > >>>> > > 1.0000000e-08 5000 -0.378 87.10 none
> > >>>> > > 0.32 50635.00 36461.00 437.09
> > >>>> > > 1.0556652e-08
> > >>>> > > 1.0000001e-08 6000 -0.372 86.89 none
> > >>>> > > 0.32 50635.00 37768.00 437.09 1.0576742e-08
> > >>>> > > 1.0000001e-08 7000 -0.334 89.26 none
> > >>>> > > 0.32 50635.00 38807.00 437.09 1.0576742e-08
> > >>>> > > 1.0000000e-08 8000 -0.368 87.52 none
> > >>>> > > 0.32 50635.00 44731.00 437.09 1.0576716e-08
> > >>>> > > 1.0000000e-08 10000 -0.374 87.39 none
> > >>>> > > 0.32 50635.00 45672.00 437.09 1.0576716e-08
> > >>>> > > 1.0000000e-08 12000 -0.298 88.26 none
> > >>>> > > Exception in thread "main" java.lang.IllegalStateException:
> > >>>> > > java.lang.ArrayIndexOutOfBoundsException:
> > >>>> > > 2
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
> > >>>> > > at
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
> > >>>> > > at
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
> > >>>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > >>>> Method)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > >>>> > >
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > >>>> > > at java.lang.reflect.Method.invoke(Method.java:622)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > >>>> > > at
> > >>>> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > >>>> > > at
> > >>>> > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> > >>>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > >>>> Method)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > >>>> > > at java.lang.reflect.Method.invoke(Method.java:622)
> > >>>> > > at
> > >>>> > > org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> > >>>> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
> > >>>> > > at
> > >>>> > > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
> > >>>> > >
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
> > >>>> > > at
> > >>>> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > >>>> > > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >>>> > > at java.lang.Thread.run(Thread.java:701)
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
> > >>>> > > andrew.musselman@gmail.com> wrote:
> > >>>> > >
> > >>>> > > > Trying out the build today
> > >>>> > > >
> > >>>> > > >
> > >>>> > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
> > >>>> suneel_marthi@yahoo.com
> > >>>> > > >wrote:
> > >>>> > > >
> > >>>> > > >> This is an issue (trivial one though) that needs to be fixed for
> > >>>> 0.9
> > >>>> > > >> Release, will be rerolling the release today (in the next few
> > >>>> hrs) and
> > >>>> > > >> putting out a new release candidate in staging.
> > >>>> > > >>
> > >>>> > > >> Thanks for reporting this Andrew P.
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
> > >>>> > > ap.dev@outlook.com>
> > >>>> > > >> wrote:
> > >>>> > > >>
> > >>>> > > >> I ran through the tests with on a CentOS VM
> > >>>> > > AMD64 2 cores 4 GB RAM. Had
> > >>>> > > >> a bit of trouble getting the Hadoop natives to compile and
> > >>>> therefore may
> > >>>> > > >> have run into some problems because of the hadoop setup. Ran
> > >>>> into some
> > >>>> > > >> problems in the example scripts. Particularly with
> > >>>> > > >> ./cluster-syntheticcontrol.sh ->4,5. I will run through the
> > >>>> rest of the
> > >>>> > > >> examples when im sure I've got hadoop setup right.
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> Apache Maven 3.1.2-SNAPSHOT
> > >>>> > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> > >>>> > > >> Java home: /usr/java/jdk1.6.0_45/jre
> > >>>> > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch:
> > >>>> "amd64",
> > >>>> > > >> family: "unix"
> > >>>> > > >> $MAHOUT_LOCAL=true
> > >>>> > > >> Hadoop 2.2.0
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> a) Verify that u can unpack the release (tar or zip) ...passed
> > >>>> (tar)
> > >>>> > > >> [passed ]
> > >>>> > > >>
> > >>>> > > >> b) Verify u r able to compile the
> > >>>> > > distro
> > >>>> > > >>
> > >>>> > > >> mvn compile- [passed with warnings]
> > >>>> > > >>
> > >>>> > > >> [WARNING] Expected all dependencies to require Scala
> > >>>> version: 2.9.3
> > >>>> > > >> [WARNING] org.apache.mahout:mahout-math-scala:0.9 requires
> > >>>> scala
> > >>>> > > >> version: 2.9.3
> > >>>> > > >> [WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala
> > >>>> > > >> version: 2.9.2
> > >>>> > > >> [WARNING] Multiple versions of scala libraries detected!
> > >>>> > > >>
> > >>>> > > >> c) Run through the unit tests: mvn clean test
> > >>>> > > >> mvn clean test [passed]
> > >>>> > > >>
> > >>>> > > >> d) Run the
> > >>>> > > >> example scripts under $MAHOUT_HOME/examples/bin.
> > >>>> > > >> Please run through all the different options in each script
> > >>>> > > >>
> > >>>> > > >> Running example scripts with $MAHOUT_LOCAL=true
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > ./cluster-syntheticcontrol.sh ->1 [works]
> > >>>> > > >> ./cluster-syntheticcontrol.sh ->2 [works]
> > >>>> > > >> ./cluster-syntheticcontrol.sh ->3 [works]
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
> > >>>> > > >> [...]
> > >>>> > > >> WARNING: Unable to add class:
> > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > >>>> > > >> java.lang.ClassNotFoundException:
> > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > >>>> > > >> at
> > >>>> > > >> java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > >>>> > > >> at java.security.AccessController.doPrivileged(Native
> > >>>> Method)
> > >>>> > > >> at
> > >>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > >>>> > > >> at
> > >>>> > > java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > >>>> > > >> at
> > >>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > >>>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > >>>> > > >> at java.lang.Class.forName0(Native Method)
> > >>>> > > >> at java.lang.Class.forName(Class.java:171)
> > >>>> > > >> at
> > >>>> > > >>
> > >>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > >>>> > > >> at
> > >>>> > > >>
> > >>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > >>>> > > >> Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
> > >>>> > > >>
> > >>>> > > >> WARNING: Unable to add class:
> > >>>> > > >>
> > >>>> > > org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > >>>> > > >> java.lang.ClassNotFoundException:
> > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > >>>> > > >> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > >>>> > > >> at java.security.AccessController.doPrivileged(Native
> > >>>> Method)
> > >>>> > > >> at
> > >>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > >>>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > >>>> > > >> at
> > >>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > >>>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > >>>> > > >> at java.lang.Class.forName0(Native Method)
> > >>>> > > >> at
> > >>>> > > java.lang.Class.forName(Class.java:171)
> > >>>> > > >> at
> > >>>> > > >>
> > >>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > >>>> > > >> at
> > >>>> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > >>>> > > >> Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
> > >>>> > > >> WARNING: No
> > >>>> > > >>
> > >>>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found
> > >>>> > > on
> > >>>> > > >> classpath, will use command-line arguments only
> > >>>> > > >> Unknown program
> > >>>> > > >> 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
> > >>>> chosen.
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> ./classify-20newsgroups.sh ->1 [works]
> > >>>> > > >> ./classify-20newsgroups.sh ->2 [works]
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> cluster-reuters.sh ->1 [works]
> > >>>> > > >>
> > >>>> > > cluster-reuters.sh ->2 [works]
> > >>>> > > >> cluster-reuters.sh ->3 [works]
> > >>>> > > >>
> > >>>> > > >> Same error as noted previosly in the thread:
> > >>>> > > >>
> > >>>> > > >> cluster-reuters.sh ->4 [0 clusters]
> > >>>> > > >>
> > >>>> > > >> [...]
> > >>>> > > >>
> > >>>> > > >> WARNING: No qualcluster.props found on classpath, will use
> > >>>> > > >> command-line arguments only
> > >>>> > > >> Num clusters: 0; maxDistance: 0.000000
> > >>>> > > >> [Dunn Index]
> > >>>> > > >> First: Infinity
> > >>>> > > >> [Davies-Bouldin Index] First: NaN
> > >>>> > > >> Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>>> > > >> INFO: Program took 669 ms (Minutes: 0.01115)
> > >>>> > > >> cluster,distance.mean,distance.sd
> > >>>> > > >>
> > >>>> > >
> > >>>> > >
> > >>>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
> > >>>> > > >> > From: suneel_marthi@yahoo.com
> > >>>> > > >> > Subject: MAHOUT 0.9 Release - New URL
> > >>>> > > >> > To: user@mahout.apache.org; dev@mahout.apache.org
> > >>>> > > >> >
> > >>>> > > >> > Third time's a Charm!!!
> > >>>> > > >> >
> > >>>> > > >> >
> > >>>> > > >> > Here's the new URL for Mahout 0.9 Release:
> > >>>> > > >> >
> > >>>> > > >>
> > >>>> > >
> > >>>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> > >>>> > > >> >
> > >>>> > > >> > For those volunteering to test this, some of the things to be
> > >>>> > > verified:
> > >>>> > > >> >
> > >>>> > > >> > a) Verify that u can unpack the release (tar or zip)
> > >>>> > > >> > b) Verify u r able to compile the distro
> > >>>> > > >> > c) Run through the unit tests: mvn clean test
> > >>>> > > >> > d) Run the example scripts
> > >>>> > > >> under $MAHOUT_HOME/examples/bin. Please run through all the
> > >>>> different
> > >>>> > > >> options in each script.
> > >>>> > > >> >
> > >>>> > > >> >
> > >>>> > > >> > Committers
> > >>>> > > >> > and PMC members:
> > >>>> > > >> > ---------------------------------------
> > >>>> > > >> >
> > >>>> > > >> > Need 'at least 3 +1 votes' for the Release to pass.
> > >>>> > > >> >
> > >>>> > > >> >
> > >>>> > > >> > Thanks and
> > >>>> > > Regards.
> > >>>> > > >>
> > >>>> > > >
> > >>>> > > >
> > >>>> > >
> > >>>>
> > >>>
> > >>>
> > >>
> > >
>
RE: MAHOUT 0.9 Release - New URL
Posted by Andrew Palumbo <ap...@outlook.com>.
Everything seems to run well on my local machine:
Checked out revision 1560364.
CentOS 6
Apache Maven 3.1.2-SNAPSHOT
Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
Java home: /usr/java/jdk1.6.0_45/jre
OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64", family: "unix"
Hadoop 2.2.0
mvn clean compile -DSkipTests [OK-Several Warnings]
mvn clean test [PASSED ALL]
mvn clean install -DskipTests [OK]
$MAHOUT_LOCAL=true
classify-20newsgroups.sh->1 [Accuracy 89.3529%]
classify-20newsgroups.sh->2 [Accuracy 90.8317%]
classify-20newsgroups.sh->3 [Accuracy 76.2746%]
classify-20newsgroups.sh->4 [cleans up]
cluster-reuters.sh->1 [20 clusters] -kmeans
cluster-reuters.sh->2 [INFO: 20 clusters] -fkmeans
cluster-reuters.sh->3 [OK] -lda
cluster-reuters.sh->4 [10 (9) clusters- see attached] -streaming kmeans
./cluster-syntheticcontrol.sh->1 [INFO: Wrote 6 clusters]
./cluster-syntheticcontrol.sh->2 [INFO: Wrote 6 clusters]
./cluster-syntheticcontrol.sh->3 [INFO: Wrote 6 clusters]
./factorize-movielens-1M.sh /home/andy/test_data/ml-1m/ratings.dat [RMSE is: 0.851264570339848]
Attached is full output of cluster-reuters.sh->4 Streaming K-Means.
>From cluster-reuters.sh->4 Streaming K-Means:
Cluster 0 is has 1 data point. Need atleast 2 data points in a cluster for OnlineSummarizer.
Average distance in cluster 1 [2816]: 3438.913758
Average distance in cluster 2 [112]: 20617.345993
Average distance in cluster 3 [4]: 32504.085379
Average distance in cluster 4 [435]: 18476.579935
Average distance in cluster 5 [27]: 21153.167574
Average distance in cluster 6 [15480]: 2040.864416
Average distance in cluster 7 [1711]: 5281.742482
Average distance in cluster 8 [964]: 15762.976239
Average distance in cluster 9 [28]: 19762.109632
Num clusters: 10; maxDistance: 107106.379648
[Dunn Index] First: 0.002272
[Davies-Bouldin Index] First: 57.871266
Jan 22, 2014 12:14:47 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 77695 ms (Minutes: 1.2949166666666667)
cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
1,3438.913758,2430.072640,250.635051,1793.254765,2908.356638,4444.702564,22173.892767,2816,train
2,20617.345993,3978.577827,-8306.835555,17787.685767,19584.319120,22864.637511,37305.829397,112,train
3,32504.085379,29250.558538,-12174.296092,12174.296092,36522.888276,26372.137172,107106.379648,4,train
4,18476.579935,3600.742072,-7212.729374,15841.995992,17431.838259,20066.610494,40205.090209,435,train
5,21153.167574,4963.661797,-8880.583978,19729.348269,21251.400944,24588.743549,27926.248558,27,train
6,2040.864416,2007.719699,53.622493,841.033934,1571.121917,2396.407672,18967.768820,15480,train
7,5281.742482,3083.071478,1933.759989,3216.929268,4074.689928,6371.577109,20292.193673,1711,train
8,15762.976239,3158.956443,65.031208,13511.867700,14744.029626,17287.006957,31483.809655,964,train
9,19762.109632,4355.120345,-8902.814641,18669.317253,20712.227220,21602.660490,27452.910312,28,train
> From: ap.dev@outlook.com
> To: dev@mahout.apache.org; user@mahout.apache.org
> Subject: RE: MAHOUT 0.9 Release - New URL
> Date: Wed, 22 Jan 2014 09:37:06 -0500
>
> will do!
>
> > Date: Wed, 22 Jan 2014 01:24:05 -0800
> > From: suneel_marthi@yahoo.com
> > Subject: Re: MAHOUT 0.9 Release - New URL
> > To: dev@mahout.apache.org; user@mahout.apache.org
> >
> > Andrew M., Andrew P. and others,
> >
> > Sebastian and me fixed a few issues today (for 0.9):
> >
> > a) Removed asf-email-examples.sh script and few other scripts that should have been removed. Also removed references/invocations to algorithms that have been removed from the codebase.
> > b) Fixed the issue with Streaming Kmeans clustering and checked in the code.
> > c) Resurrected Frequent Pattern Mining implementation for 0.9.
> >
> > Please checkout the latest code from trunk, run a build locally and run thru the example scripts.
> >
> > Thanks and Regards.
> >
> >
> >
> >
> >
> >
> > On Wednesday, January 22, 2014 12:11 AM, Andrew Musselman <an...@gmail.com> wrote:
> >
> > *factorize-movielens-1M.sh:*
> > RMSE is:
> >
> > 0.8519064098265133
> >
> >
> > Sample recommendations:
> >
> > 2229
> > [2197:4.4961276,527:4.4309845,2972:4.4180074,318:4.379484,572:4.312696,3092:4.249903]
> > 5848
> > [1900:4.6775646,3787:4.6623707,632:4.641377,2609:4.608225,3808:4.6058936,2998:4.6057487]
> > 3728
> > [572:4.951382,47:4.748921,874:4.6945343,1563:4.679901,3314:4.6621537,50:4.655838]
> > 1252
> > [53:5.0,3816:4.9664702,3077:4.9494777,213:4.94007,3808:4.9060082,978:4.8568053]
> > 634
> > [572:5.0,3092:4.779557,1872:4.72024,2687:4.629712,2125:4.615142,3853:4.5261393]
> > 5516 [572:5.0,2197:5.0,3092:5.0,318:4.908213,356:4.885,3844:4.8237453]
> > 2276 [1204:5.0,572:5.0,912:5.0,1250:5.0,1272:4.999891,1262:4.989652]
> > 4219
> > [53:4.8112006,598:4.775032,858:4.761604,572:4.7579737,1219:4.680987,1221:4.6604886]
> > 91 [1198:5.0,2762:5.0,1207:5.0,1234:5.0,318:5.0,260:5.0]
> > 502
> > [953:5.0,260:4.9800477,1234:4.869403,1198:4.8527064,1207:4.8497486,3469:4.847286]
> >
> > factorize-netflix.sh:
> > References a no-longer-available data set that Netflix took down after the
> > competition; should at least mention that the data set is no longer
> > "online" at least.
> >
> >
> > On Tue, Jan 21, 2014 at 8:05 PM, Andrew Musselman <
> > andrew.musselman@gmail.com> wrote:
> >
> > > *clustering-syntheticcontrol.sh*
> > >
> > > *Canopy:*
> > > [snip]
> > > 1.0 : [distance-squared=1740.681000315628]: [35.486, 25.600,
> > > 29.914, 30.200, 27.654, 28.647, 26.582, 32.941, 34.728, 29.047, 34.706,
> > > 34.764, 28.816, 30.271, 25.784, 25.035, 35.436, 29.075, 24.267, 24.625,
> > > 28.162, 26.218, 28.773, 28.966, 28.802, 34.292, 26.211, 33.363, 32.920,
> > > 31.891, 34.504, 32.686, 24.327, 35.981, 31.390, 10.832, 20.238, 10.051,
> > > 14.877, 10.570, 19.603, 14.544, 10.667, 16.470, 19.007, 10.352, 13.473,
> > > 12.196, 10.684, 16.620, 20.434, 17.069, 18.744, 9.599, 11.195, 12.002,
> > > 10.017, 17.149, 14.850, 10.890]
> > > 1.0 : [distance-squared=1455.363773097357]: [31.022, 28.140,
> > > 26.730, 26.570, 29.561, 26.966, 28.049, 25.673, 33.721, 26.275, 30.410,
> > > 31.101, 24.019, 35.659, 25.253, 25.932, 28.618, 32.423, 33.666, 33.745,
> > > 35.118, 29.164, 25.477, 31.947, 35.491, 30.730, 25.820, 24.651, 25.528,
> > > 31.343, 29.005, 31.825, 26.891, 28.194, 31.429, 16.935, 8.070, 16.604,
> > > 14.743, 10.342, 8.155, 10.395, 17.689, 16.791, 14.138, 15.761, 6.787,
> > > 13.062, 16.660, 15.021, 9.891, 9.216, 11.550, 8.877, 18.220, 9.477, 10.342,
> > > 16.430, 11.898, 15.366]
> > > 1.0 : [distance-squared=1679.9304895378882]: [29.625, 25.503,
> > > 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> > > 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> > > 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> > > 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> > > 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> > > 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> > > 15.285, 22.528, 20.657, 24.129]
> > > 1.0 : [distance-squared=2044.2887801683828]: [27.414, 25.397,
> > > 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> > > 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> > > 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> > > 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> > > 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> > > 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> > > 20.229, 11.131, 9.980, 10.720]
> > > 1.0 : [distance-squared=1385.3154063160764]: [35.899, 26.672,
> > > 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> > > 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> > > 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> > > 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> > > 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> > > 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> > > 16.546, 15.927, 18.084, 17.475]
> > > 1.0 : [distance-squared=1920.6376615603585]: [24.538, 24.280,
> > > 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> > > 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> > > 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> > > 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> > > 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> > > 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> > > 19.310, 12.999, 17.460]
> > > 1.0 : [distance-squared=2192.939571172661]: [34.335, 30.938,
> > > 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> > > 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> > > 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> > > 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> > > 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> > > 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> > > 11.743, 11.699, 10.152]
> > > Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Wrote 6 clusters
> > > Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Program took 5827 ms (Minutes: 0.09711666666666667)
> > >
> > > *K-means:*
> > > [snip]
> > > 1.0 : [distance-squared=2873.881301031739]: [26.369, 37.791,
> > > 41.839, 39.694, 36.728, 35.079, 30.668, 24.755, 20.610, 18.885, 15.459,
> > > 14.074, 19.117, 34.230, 32.202, 40.715, 39.543, 37.193, 39.448, 30.829,
> > > 25.273, 22.324, 19.801, 13.505, 17.462, 24.287, 27.398, 38.577, 42.108,
> > > 41.947, 43.987, 41.331, 28.958, 23.664, 20.308, 22.244, 12.149, 15.768,
> > > 18.964, 32.579, 33.202, 36.205, 42.364, 40.601, 35.415, 39.576, 33.145,
> > > 19.936, 23.062, 19.053, 24.383, 19.611, 25.218, 38.860, 36.570, 38.964,
> > > 36.946, 36.900, 32.593, 31.563]
> > > 1.0 : [distance-squared=2525.8924241648783]: [35.389, 31.178,
> > > 40.041, 43.034, 49.524, 40.942, 42.369, 30.153, 26.253, 23.178, 19.885,
> > > 19.693, 21.837, 26.858, 23.533, 29.798, 43.401, 43.349, 48.238, 43.868,
> > > 44.017, 28.056, 25.284, 22.737, 13.703, 14.023, 22.115, 23.720, 25.478,
> > > 37.870, 37.868, 46.954, 43.135, 41.286, 37.269, 29.347, 24.312, 21.743,
> > > 19.152, 21.668, 10.882, 16.147, 30.020, 28.472, 38.716, 44.620, 47.847,
> > > 48.846, 39.361, 38.449, 33.105, 19.935, 14.961, 12.149, 12.630, 13.459,
> > > 18.967, 35.473, 30.146, 45.527]
> > > 1.0 : [distance-squared=2392.7171990886272]: [27.662, 37.199,
> > > 39.158, 44.264, 46.473, 40.178, 38.728, 24.412, 25.155, 15.938, 13.125,
> > > 16.867, 13.875, 29.969, 34.322, 40.870, 44.225, 47.594, 47.607, 44.751,
> > > 39.813, 32.461, 16.646, 16.305, 21.256, 20.627, 19.267, 31.901, 34.995,
> > > 43.643, 43.152, 47.125, 48.524, 33.131, 32.697, 20.300, 14.350, 18.330,
> > > 12.155, 11.261, 19.701, 29.313, 30.457, 39.914, 49.543, 49.851, 46.445,
> > > 39.561, 28.860, 22.486, 13.967, 10.006, 11.715, 15.945, 21.348, 25.439,
> > > 41.024, 37.105, 45.623, 43.589]
> > > 1.0 : [distance-squared=1419.8378244373016]: [25.784, 34.129,
> > > 42.659, 37.176, 35.961, 34.307, 32.108, 29.749, 25.047, 17.455, 24.652,
> > > 25.311, 22.995, 30.256, 25.955, 28.426, 34.556, 40.386, 39.642, 40.566,
> > > 32.612, 34.091, 26.033, 25.668, 26.545, 17.338, 24.980, 20.134, 27.542,
> > > 36.612, 31.855, 37.947, 39.736, 33.535, 36.607, 37.479, 32.612, 22.262,
> > > 20.662, 16.124, 24.547, 27.686, 21.747, 27.198, 31.259, 40.569, 37.067,
> > > 34.465, 34.730, 33.371, 23.060, 30.162, 22.022, 22.216, 14.812, 19.357,
> > > 24.508, 34.432, 32.155, 34.839]
> > > 1.0 : [distance-squared=4186.814512311335]: [25.870, 39.195,
> > > 36.908, 47.052, 47.384, 40.741, 42.494, 30.282, 25.834, 17.650, 16.004,
> > > 17.895, 13.321, 19.045, 27.440, 31.911, 39.208, 43.622, 41.567, 44.815,
> > > 44.921, 35.422, 35.477, 23.190, 17.859, 14.684, 23.504, 23.141, 21.746,
> > > 30.816, 31.361, 37.015, 38.094, 46.688, 47.681, 43.777, 39.652, 31.701,
> > > 23.767, 22.265, 22.654, 22.327, 19.195, 21.163, 29.602, 27.563, 36.244,
> > > 38.859, 44.234, 42.352, 42.160, 40.172, 30.094, 21.092, 25.193, 13.096,
> > > 18.111, 14.754, 27.386, 27.026]
> > > 1.0 : [distance-squared=1544.4011543572997]: [28.075, 41.784,
> > > 42.120, 38.735, 44.320, 34.316, 32.212, 31.868, 24.301, 14.547, 17.178,
> > > 22.279, 24.357, 31.011, 31.444, 34.837, 46.550, 48.301, 38.859, 42.363,
> > > 35.657, 31.499, 19.794, 12.124, 15.371, 12.436, 15.763, 24.679, 32.597,
> > > 43.004, 36.616, 38.935, 42.954, 34.957, 36.183, 28.177, 16.326, 14.988,
> > > 10.680, 22.728, 24.075, 24.058, 36.616, 43.982, 39.198, 40.118, 40.078,
> > > 34.752, 34.018, 23.750, 18.374, 12.251, 15.539, 18.699, 28.973, 24.044,
> > > 39.404, 38.034, 46.458, 44.432]
> > > 1.0 : [distance-squared=825.9338725427806]: [33.670, 38.675,
> > > 39.742, 41.989, 37.291, 43.975, 31.909, 25.878, 31.080, 15.858, 13.950,
> > > 23.097, 19.983, 21.692, 31.579, 38.570, 33.376, 38.843, 41.936, 33.534,
> > > 39.195, 32.897, 25.343, 18.523, 15.089, 17.771, 22.614, 25.313, 23.687,
> > > 29.010, 41.995, 35.712, 40.872, 41.669, 32.156, 25.162, 24.980, 23.705,
> > > 18.413, 20.975, 14.906, 26.171, 30.165, 27.818, 35.083, 39.514, 37.851,
> > > 33.967, 32.338, 34.977, 26.589, 28.079, 19.597, 24.669, 23.098, 25.685,
> > > 28.215, 34.940, 36.910, 39.749]
> > > Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Wrote 6 clusters
> > > Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Program took 16902 ms (Minutes: 0.2817)
> > >
> > > *Fuzzy k-means:*
> > > [snip]
> > > 1.0 : [distance-squared=971.7369782121968]: [29.625, 25.503,
> > > 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> > > 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> > > 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> > > 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> > > 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> > > 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> > > 15.285, 22.528, 20.657, 24.129]
> > > 1.0 : [distance-squared=2054.618163154475]: [27.414, 25.397,
> > > 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> > > 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> > > 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> > > 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> > > 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> > > 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> > > 20.229, 11.131, 9.980, 10.720]
> > > 1.0 : [distance-squared=954.6503560728597]: [35.899, 26.672,
> > > 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> > > 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> > > 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> > > 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> > > 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> > > 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> > > 16.546, 15.927, 18.084, 17.475]
> > > 1.0 : [distance-squared=2817.9170498632957]: [24.538, 24.280,
> > > 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> > > 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> > > 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> > > 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> > > 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> > > 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> > > 19.310, 12.999, 17.460]
> > > 1.0 : [distance-squared=3472.3684696871424]: [34.335, 30.938,
> > > 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> > > 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> > > 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> > > 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> > > 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> > > 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> > > 11.743, 11.699, 10.152]
> > > Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Wrote 6 clusters
> > > Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> > > INFO: Program took 106615 ms (Minutes: 1.7769166666666667)
> > >
> > > *Dirichlet and Meanshift:*
> > > Already detailed in M-1400, deprecated jobs still referenced.
> > >
> > >
> > >
> > > On Tue, Jan 21, 2014 at 6:20 PM, Andrew Musselman <
> > > andrew.musselman@gmail.com> wrote:
> > >
> > >> *cluster-reuters.sh*
> > >> *k-means:*
> > >>
> > >> [snip]
> > >> :VL-19482{n=913 c=[0.06:0.011, 0.1:0.007, 0.13:0.010, 0.25:0.016,
> > >> 0.38:0.020, 0.4:0.007, 0.5:0.032, 0
> > >> Top Terms:
> > >> banks =>
> > >> 3.841823268955143
> > >> bank =>
> > >> 3.80633066361209
> > >> debt =>
> > >> 3.28065219870794
> > >> said =>
> > >> 2.5965700942088583
> > >> he =>
> > >> 2.335682813857497
> > >> foreign =>
> > >> 2.2217853688201403
> > >> billion =>
> > >> 2.1970193848291335
> > >> would =>
> > >> 1.9932392063955617
> > >> loans =>
> > >> 1.9309276792854233
> > >> interest =>
> > >> 1.787324501938
> > >> have =>
> > >> 1.762981951432578
> > >> its =>
> > >> 1.7615109954971866
> > >> which =>
> > >> 1.5822081148036862
> > >> has =>
> > >> 1.5600708189041956
> > >> dlrs =>
> > >> 1.5571038313005996
> > >> finance =>
> > >> 1.5539758811252924
> > >> new =>
> > >> 1.5176015811577555
> > >> had =>
> > >> 1.5138723701401844
> > >> brazil =>
> > >> 1.5083369853593172
> > >> payments =>
> > >> 1.4539044255886517
> > >> Weight : [props - optional]: Point:
> > >>
> > >> :VL-7320{n=2726 c=[0:0.003, 0.1:0.010, 0.2:0.007, 0.3:0.009, 0.4:0.007,
> > >> 0.40:0.003, 0.5:0.009, 0.57:0
> > >> Top Terms:
> > >> vs =>
> > >> 6.126130791333171
> > >> net =>
> > >> 4.012191567277523
> > >> cts =>
> > >> 3.822006848832744
> > >> shr =>
> > >> 3.6786004856764527
> > >> mln =>
> > >> 2.9011643584038698
> > >> loss =>
> > >> 2.788368861463607
> > >> qtr =>
> > >> 2.714140225051522
> > >> revs =>
> > >> 2.4739861236454717
> > >> profit =>
> > >> 1.8146888090247015
> > >> note =>
> > >> 1.7977163272138388
> > >> dlrs =>
> > >> 1.6164390808155846
> > >> avg =>
> > >> 1.3901765773336587
> > >> shrs =>
> > >> 1.3856326531419314
> > >> mths =>
> > >> 1.3168717272038506
> > >> 4th =>
> > >> 1.2161158425617289
> > >> oper =>
> > >> 1.182419473776814
> > >> year =>
> > >> 1.178086061733047
> > >> nine =>
> > >> 1.0670554836445316
> > >> 3rd =>
> > >> 1.041334410056592
> > >> inc =>
> > >> 1.0019361981554935
> > >> Weight : [props - optional]: Point:
> > >>
> > >>
> > >> Inter-Cluster Density: 0.45562152681859414
> > >> Intra-Cluster Density: 0.6952712632167628
> > >> CDbw Inter-Cluster Density: 0.0
> > >> CDbw Intra-Cluster Density: 16.486930227598684
> > >> CDbw Separation: 194.49005884464628
> > >>
> > >> *fuzzy k-means:*
> > >> :SV-18539{n=1039 c=[0:0.026, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> > >> 0.01:0.005, 0.02:0.002, 0.0
> > >> Top Terms:
> > >> said =>
> > >> 1.8665592354713065
> > >> its =>
> > >> 1.1335212213411592
> > >> pct =>
> > >> 1.0862816801353348
> > >> dlrs =>
> > >> 1.0854998884993752
> > >> mln =>
> > >> 1.043163996400643
> > >> from =>
> > >> 0.9684961110525736
> > >> has =>
> > >> 0.912161511978058
> > >> company =>
> > >> 0.8754186972808333
> > >> mar =>
> > >> 0.8675333452422878
> > >> inc =>
> > >> 0.7678617590362815
> > >> would =>
> > >> 0.7610968883652675
> > >> he =>
> > >> 0.7459988770503974
> > >> which =>
> > >> 0.7435613119406804
> > >> year =>
> > >> 0.7302840632748394
> > >> u.s =>
> > >> 0.7281061062439116
> > >> shares =>
> > >> 0.7260764102983083
> > >> corp =>
> > >> 0.7179807367808658
> > >> new =>
> > >> 0.7044203783157115
> > >> stock =>
> > >> 0.6962010978721442
> > >> have =>
> > >> 0.6464265467298506
> > >> :SV-9431{n=1034 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> > >> 0.01:0.004, 0.02:0.002, 0.02
> > >> Top Terms:
> > >> said =>
> > >> 1.864911184196927
> > >> dlrs =>
> > >> 1.199286689822081
> > >> mln =>
> > >> 1.1802134783562215
> > >> pct =>
> > >> 1.1529704214798124
> > >> its =>
> > >> 1.1184398851519701
> > >> from =>
> > >> 1.016647848050332
> > >> company =>
> > >> 0.894703604722841
> > >> mar =>
> > >> 0.879986159541356
> > >> has =>
> > >> 0.8642799128491316
> > >> year =>
> > >> 0.8271823503717782
> > >> inc =>
> > >> 0.7871293745341424
> > >> corp =>
> > >> 0.737705498468879
> > >> which =>
> > >> 0.722975201852743
> > >> would =>
> > >> 0.708000816484415
> > >> u.s =>
> > >> 0.7073294276173905
> > >> billion =>
> > >> 0.7055723996916351
> > >> he =>
> > >> 0.7042684217823294
> > >> new =>
> > >> 0.6834737905434939
> > >> shares =>
> > >> 0.6753327384172428
> > >> stock =>
> > >> 0.6576225144041699
> > >> :SV-4785{n=1044 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> > >> 0.01:0.006, 0.02:0.002, 0.02
> > >> Top Terms:
> > >> said =>
> > >> 1.8796076179735086
> > >> its =>
> > >> 1.172025965452378
> > >> dlrs =>
> > >> 1.130422792460914
> > >> pct =>
> > >> 1.082038255241358
> > >> mln =>
> > >> 1.0772146872767114
> > >> company =>
> > >> 0.9662235879639138
> > >> from =>
> > >> 0.9473172871605616
> > >> has =>
> > >> 0.9224712965830099
> > >> mar =>
> > >> 0.8769325856924421
> > >> inc =>
> > >> 0.8360245257169788
> > >> shares =>
> > >> 0.8334595641384324
> > >> stock =>
> > >> 0.7704621839612175
> > >> corp =>
> > >> 0.7682400250301806
> > >> which =>
> > >> 0.7389988207856137
> > >> would =>
> > >> 0.7339708917389389
> > >> year =>
> > >> 0.7088414843731325
> > >> new =>
> > >> 0.7038109468655172
> > >> he =>
> > >> 0.6993994455501005
> > >> u.s =>
> > >> 0.6772649147622415
> > >> share =>
> > >> 0.6241804830055171
> > >>
> > >> *lda:*
> > >>
> > >> [snip]
> > >> 21539
> > >> {0.02:0.7071698119320297,0.01:0.09185661419250732,0.055:0.05559716236030052,0:0.04416624482186779,0.046:0.04110903741109505,0.10:0.036664417789754995,0.073:0.009543544335363714,0.006913:0.0050293341354450535,0.007050:0.004031353935388081,0.003:0.0019163134919350053}
> > >> 21540
> > >> {0.04:0.4289840457590362,0.006913:0.3764552842292379,0.03:0.14201700033942147,0.025:0.031492533562460345,0.01:0.0057428959027208,0.0625:0.005563615996288134,0.007050:0.004410121345864598,0.02:0.003315679992182833,0.06:0.0010680133665222197,0.057:2.6908116668663575E-4}
> > >> 21541
> > >> {0:0.6323790020346274,0.02:0.282038438551802,0.06:0.05929917592183956,0.046:0.021922159161082488,0.025:0.003123559949176154,0.1:6.163861804777446E-4,0.057:1.1513607281652563E-4,0.077:7.546641269535658E-5,0.05:6.020069105221075E-5,0.04:5.727546417602751E-5}
> > >> 21542
> > >> {0.06:0.7916202902993545,0.003:0.12474538653424426,0.03:0.0516187501990535,0.073:0.013078206873952192,0.077:0.006874558859394474,0.1:0.004726257215175674,0.07:0.0029954110394770084,0.02:0.0015535221634909381,0.0625:8.057802584933225E-4,0.006913:5.349608183182145E-4}
> > >> 21543
> > >> {0.03:0.8557703918728101,0.007050:0.0884696514495358,0.01:0.02386217742025169,0.055:0.01794750983209314,0.046:0.007244240662980594,0.02:0.0025021610305199344,0.1:0.0011795344808501283,0.0625:9.555718731181631E-4,0.077:8.384080940465597E-4,0.003:2.4480111447900804E-4}
> > >> 21544
> > >> {0.006913:0.6497351267772595,0.03:0.2586364130498913,0.003:0.0845308091084703,0.05:0.003488921831506818,0.0625:0.0011991059579690732,0.06:5.301659535652308E-4,0.055:3.152013670552699E-4,0.025:3.11949418681945E-4,0.02:2.2346509541410544E-4,0.057:1.727741085612565E-4}
> > >> 21545
> > >> {0.07:0.24937001680778675,0.03:0.22854680579732564,0.1:0.22068070811382484,0.05:0.21891358916881656,0.06:0.04971205462818302,0.046:0.029384231252419384,0.025:0.0014118858333346275,0.0625:0.0012321257282372393,0.073:3.1606402052550454E-4,0.04:1.1662985389192366E-4}
> > >> 21546
> > >> {0.006913:0.652435612545713,0.073:0.24282539253890825,0.003:0.055020663101050894,0.046:0.01978650831133704,0.04:0.011699600233998459,0.02:0.006822518565048025,0.05:0.0042733514779345234,0.025:0.0020768247329981497,0.03:0.0015466188524926666,0:0.0010433788491090747}
> > >> 21547
> > >> {0.1:0.935072008706917,0.077:0.040616382672055494,0.046:0.023138054104883225,0.06:3.42252302356326E-4,0.03:2.672247473681925E-4,0.007050:9.004419280457053E-5,0.006913:7.844271097106069E-5,0.04:7.198835322717653E-5,0.073:5.8017696474244885E-5,0.02:4.474316852571109E-5}
> > >> 21548
> > >> {0.006913:0.4510141547325999,0.03:0.21472666934984877,0.07:0.15060570743672352,0.046:0.08216181853028293,0.05:0.07498297963542139,0.077:0.01401215532342401,0.04:0.006135722806477439,0.073:0.0031124469556872442,0.02:0.001327252389650958,0.025:4.5167404646311124E-4}
> > >> 21549
> > >> {0.077:0.5249260290096315,0:0.27520186965742544,0.073:0.08959570186504386,0.057:0.05063809804337512,0.02:0.045313417578343,0.03:0.00856024297303885,0.06:0.0034039136814002697,0.07:8.611323331226122E-4,0.05:3.9387255234958607E-4,0.04:3.454752498776842E-4}
> > >> 21550
> > >> {0.077:0.8319708524327014,0.007050:0.16001430652201168,0.0625:0.0024884528530674226,0.03:0.0024228964094551416,0.006913:0.0010048446080994332,0.025:5.974607168723063E-4,0.06:2.7741593377792194E-4,0.057:2.2975316663437597E-4,0.073:2.1769059479546598E-4,0.01:1.4945234676270913E-4}
> > >> 21551
> > >> {0:0.43298549949426596,0.07:0.30407459462158964,0.0625:0.15744077685124136,0.01:0.06385977925647884,0.025:0.020205557109867888,0.04:0.013226123091680062,0.055:0.006501562964287696,0.073:5.871897881404074E-4,0.003:4.554836644848264E-4,0.05:2.0489216962221778E-4}
> > >> 21552
> > >> {0.06:0.7383744333942458,0.02:0.07321126682102753,0.07:0.05910378841288311,0.10:0.056907223730939045,0:0.02739559786902668,0.055:0.02228913751272657,0.1:0.00943274247398869,0.073:0.007301445750018608,0.03:0.0027711985062277246,0.046:0.0022569760697531112}
> > >> 21553
> > >> {0.006913:0.9383779606743132,0.07:0.018356074893823605,0.1:0.017572502072403694,0.025:0.011031848455352145,0.02:0.007110054905474798,0.10:0.0032390933761145377,0.01:0.0022676909091530165,0.06:6.35712654913703E-4,0.003:3.037608224814005E-4,0.0625:2.569989005512836E-4}
> > >> 21554
> > >> {0.02:0.4740260172915081,0.03:0.3264871353578654,0.003:0.12383071192450323,0:0.0387913487693229,0.046:0.013922363892934853,0.01:0.01115832101415319,0.055:0.004438529958216764,0.006913:0.0031112721201723413,0.025:0.002399946628810436,0.073:6.131643125523258E-4}
> > >> 21555
> > >> {0.01:0.42396888624053325,0.057:0.2876207419392007,0.055:0.13436747721404915,0.10:0.0923627978966975,0.05:0.030530372962547347,0.003:0.008404234144369286,0.006913:0.007605013189108045,0.007050:0.005108825532219537,0.0625:0.004651573366090424,0.025:0.002180181204608079}
> > >> 21556
> > >> {0.02:0.9634685201058687,0.077:0.02816791882994464,0.003:0.007368001305747207,0.01:4.6299664334060455E-4,0.055:6.358811991095171E-5,0.06:5.5240076581247115E-5,0.03:5.231350505975146E-5,0.073:4.4773553407989E-5,0:3.6810457882911134E-5,0.006913:3.469265861451538E-5}
> > >> 21557
> > >> {0.06:0.9993947051352264,0.05:2.53296407549323E-4,0.03:9.647254237540585E-5,0.04:4.3439270433017595E-5,0.006913:3.354681491698619E-5,0.046:2.8586727566416525E-5,0.007050:2.5584606074089293E-5,0.02:2.2477243875316502E-5,0:1.9194783598477564E-5,0.073:1.6900267481892075E-5}
> > >> 21558
> > >> {0.06:0.9985073788555696,0.03:8.033091100049725E-4,0.02:1.1953766739610202E-4,0.1:9.257610652171745E-5,0.006913:6.903783269939522E-5,0.04:5.7960967280609926E-5,0.07:5.5767050956214925E-5,0.046:4.857027337508731E-5,0.007050:3.5964741541364354E-5,0.057:3.538185773175377E-5}
> > >> 21559
> > >> {0.006913:0.5411158302162348,0.073:0.10595054605908563,0.04:0.10013413310674449,0.007050:0.08891690362990352,0.003:0.08028744789933502,0.03:0.07779866329563544,0.025:0.0024314950846438975,0.0625:0.0018111845757907532,0.077:5.019763638644379E-4,0.055:2.6705221435486376E-4}
> > >> 21560
> > >> {0.06:0.9978884964462115,0.03:9.139179655096763E-4,0.02:6.066525825847198E-4,0.04:3.2209695809936266E-4,0.006913:7.970816960964983E-5,0.007050:5.58984012266038E-5,0.046:1.545206468939752E-5,0.077:1.5319891034946036E-5,0:1.5223193391279898E-5,0.073:1.1687759018690983E-5}
> > >> 21561
> > >> {0.06:0.8704787441462888,0.007050:0.09131090501970876,0.0625:0.03778566273654969,0.046:8.111987924033587E-5,0.01:6.397000919080148E-5,0.1:4.979138875277178E-5,0.073:2.7942853854174412E-5,0.03:2.762357950066146E-5,0.077:2.6820742114533983E-5,0.025:2.6208487691114472E-5}
> > >> 21562
> > >> {0.06:0.9625603310705717,0.02:0.01708933716171641,0.007050:0.011374975845817934,0.07:0.00482114999912725,0.057:0.003310531318631415,0.077:2.568457462720504E-4,0.025:1.3439876933758153E-4,0.1:1.0512433283405881E-4,0:8.737443941062146E-5,0.046:5.9284232059258864E-5}
> > >> 21563
> > >> {0.06:0.9996809273982157,0.046:1.6325482120709976E-4,0.02:2.1084598024339765E-5,0.006913:1.7901242019979392E-5,0.04:1.3991162886383784E-5,0.03:1.3682157013015017E-5,0.077:1.3602492456590212E-5,0.007050:1.206712606141859E-5,0.1:1.0231842284246997E-5,0.057:7.704725064510759E-6}
> > >> 21564
> > >> {0.06:0.9997861963976675,0.03:2.7140899197995276E-5,0.077:2.4456158311423386E-5,0.04:1.7952818233846462E-5,0.02:1.6334455815684533E-5,0.006913:1.4910843270220926E-5,0.073:1.4893553937733922E-5,0.1:1.2545291899719683E-5,0.007050:1.2337617858874285E-5,0.07:1.121302251254011E-5}
> > >> 21565
> > >> {0.006913:0.5047832315246878,0.007050:0.2502505818382197,0.04:0.09937533960784072,0.03:0.05332716291468396,0.0625:0.035738976624857435,0.05:0.023139962103851885,0.1:0.01510786357969295,0.025:0.01294816540331917,0.06:0.003589347327961106,0.073:4.8607269023994543E-4}
> > >> 21566
> > >> {0.03:0.865579490292393,0.073:0.065013560785593,0.077:0.056622094108767465,0.046:0.006811842330071251,0.057:0.0021561477140846267,0.01:0.0013149375957061502,0.04:7.456782721333958E-4,0.05:6.412995789267404E-4,0:2.107021879325011E-4,0.007050:1.8121393989190674E-4}
> > >> 21567
> > >> {0.077:0.6528663315309344,0.03:0.2794089480653573,0.025:0.060214953606503134,0.003:0.0029851203917978303,0.01:0.002840784719750811,0.007050:5.508511345707982E-4,0.02:3.3395642786457786E-4,0.07:1.7452781529689483E-4,0.055:1.541344869853217E-4,0.046:1.0154945247629696E-4}
> > >> 21568
> > >> {0.057:0.7925855379581803,0.03:0.2036907633660934,0.06:0.002283924010657722,0.046:0.0011125177332923534,0.10:6.706442563331911E-5,0.02:3.9070227131596934E-5,0.07:3.266094677087569E-5,0.1:2.3436639268605713E-5,0.077:2.318067689954084E-5,0.006913:2.273173553155518E-5}
> > >> 21569
> > >> {0.06:0.7223525910216753,0.1:0.22280373045161775,0.04:0.04239924319412595,0.02:0.006529556700876843,0.007050:0.004661124794787862,0.10:2.017974034648702E-4,0.05:2.017516118028694E-4,0.025:1.583677755896652E-4,0.006913:1.1939663934259253E-4,0:8.736457986006156E-5}
> > >> 21570
> > >> {0.073:0.5321098304788365,0.006913:0.3897385574581158,0.02:0.051125703897749404,0.077:0.012417023254098358,0.01:0.006491518762040415,0.03:0.005434610908750246,0.055:9.352347087701305E-4,0.06:5.239363525283659E-4,0.10:4.4114444135088393E-4,0.003:2.2360302221231853E-4}
> > >> 21571
> > >> {0.06:0.9074255414695478,0.05:0.08971808812931319,0.02:0.0019124104766371694,0.1:2.581234320438502E-4,0.073:1.9041498001195312E-4,0.046:8.348355382104383E-5,0.006913:5.748062878632201E-5,0.057:4.8979196235823963E-5,0.04:4.4005272512340306E-5,0.03:4.1089625609562384E-5}
> > >> 21572
> > >> {0.06:0.7216888928389846,0.04:0.1955684645266304,0.006913:0.08235454447065854,0.1:9.82899034505965E-5,0.046:4.9184577303445956E-5,0.05:4.021965070167039E-5,0.007050:3.081280652634891E-5,0.073:2.791145099471127E-5,0.02:1.95681856192452E-5,0:1.9037190007395713E-5}
> > >> 21573
> > >> {0.05:0.8568971411565196,0.046:0.12909436895238377,0.06:0.012704234652048044,0.04:2.472784666357729E-4,0.055:2.1886698996361582E-4,0.1:1.1974451444757112E-4,0.0625:9.082357988309755E-5,0.07:9.030017229129562E-5,0.03:7.268045692763623E-5,0.073:6.607926928741721E-5}
> > >> 21574
> > >> {0.046:0.5619466458628039,0.006913:0.184782367089353,0.0625:0.09726566772972363,0.003:0.09534816862353344,0.02:0.015692927163565275,0.073:0.015492017672231727,0.01:0.01411218625979968,0.007050:0.01037341031640615,0.055:0.0020124341216292752,0:0.0013828685922332715}
> > >> 21575
> > >> {0.05:0.3167681189235041,0.06:0.2879333280436204,0.046:0.22584628506521745,0.003:0.15994241628395953,0.0625:0.006482994028630967,0.1:0.0016773104050919493,0.055:4.8725015996676173E-4,0.03:2.432893903382962E-4,0.025:1.6015914759364425E-4,0.006913:1.0021292427951807E-4}
> > >> 21576
> > >> {0.077:0.2407816981967022,0.003:0.18594668103110193,0.10:0.14951775492012523,0.0625:0.09241152906714677,0.007050:0.08997645163280943,0.057:0.07102865286733068,0.055:0.048046579920457584,0.05:0.03776387140040494,0.073:0.033355081179026046,0.006913:0.015980584385115525}
> > >> 21577
> > >> {0.06:0.9599074612361259,0.1:0.03694135499501186,0.04:0.0021941315048273186,0.057:4.7070027358666304E-4,0.02:1.8231606308803002E-4,0.0625:6.234942557920162E-5,0.006913:5.159011604129561E-5,0.046:3.9832812943124216E-5,0.007050:3.0380683715134534E-5,0.05:2.8920851352755496E-5}
> > >>
> > >>
> > >> *Streaming k-means:*
> > >>
> > >> [snip]
> > >> INFO: Number of Centroids: 0
> > >> Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
> > >> WARNING: job_local23982482_0001
> > >> java.lang.IllegalArgumentException: Must have nonzero number of training
> > >> and test vectors. Asked for %.1f %% of %d vectors for test
> > >> [10.000000149011612, 0]
> > >> at
> > >> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
> > >> at
> > >> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
> > >> at
> > >> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
> > >> at
> > >> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
> > >> at
> > >> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
> > >> at
> > >> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
> > >> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
> > >> at
> > >> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
> > >> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
> > >> at
> > >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
> > >>
> > >> [snip]
> > >>
> > >> WARNING: No qualcluster.props found on classpath, will use command-line
> > >> arguments only
> > >> Num clusters: 0; maxDistance: 0.000000
> > >> [Dunn Index] First: Infinity
> > >> [Davies-Bouldin Index] First: NaN
> > >> Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
> > >> INFO: Program took 535 ms (Minutes: 0.008916666666666666)
> > >> cluster,distance.mean,distance.sd
> > >> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > >>
> > >>
> > >> On Tue, Jan 21, 2014 at 1:47 PM, Andrew Musselman <
> > >> andrew.musselman@gmail.com> wrote:
> > >>
> > >>> *classify-20newsgroups.sh*
> > >>>
> > >>> *Complementary naive bayes:*
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances : 11207 98.9406%
> > >>> Incorrectly Classified Instances : 120 1.0594%
> > >>> Total Classified Instances : 11327
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a b c d e f g h i
> > >>> j k l m n o p q r s
> > >>> t <--Classified as
> > >>> 475 0 0 1 0 0 0 0 0
> > >>> 0 0 0 0 0 1 0 1 0 0
> > >>> 0 | 478 a = alt.atheism
> > >>> 0 597 1 1 0 1 1 0 0
> > >>> 0 0 1 0 2 1 0 0 0 0
> > >>> 0 | 605 b = comp.graphics
> > >>> 0 1 620 3 0 1 0 0 0
> > >>> 0 0 1 0 0 1 0 0 0 0
> > >>> 0 | 627 c = comp.os.ms-windows.misc
> > >>> 1 1 1 593 2 0 0 0 0
> > >>> 0 0 0 0 0 0 1 0 0 0
> > >>> 0 | 599 d = comp.sys.ibm.pc.hardware
> > >>> 0 1 1 0 568 0 1 0 0
> > >>> 0 1 1 2 0 0 0 0 1 0
> > >>> 0 | 576 e = comp.sys.mac.hardware
> > >>> 0 4 2 0 0 581 0 0 0
> > >>> 0 0 0 0 0 0 0 0 0 0
> > >>> 0 | 587 f = comp.windows.x
> > >>> 0 0 0 1 2 0 571 3 0
> > >>> 0 1 1 4 1 0 0 0 0 0
> > >>> 0 | 584 g = misc.forsale
> > >>> 0 0 0 1 0 0 0 589 1
> > >>> 0 0 1 1 0 0 0 0 0 0
> > >>> 0 | 593 h = rec.autos
> > >>> 0 0 0 0 0 0 0 1 565
> > >>> 0 0 0 0 0 1 0 0 0 0
> > >>> 0 | 567 i = rec.motorcycles
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 600 2 0 0 0 1 0 0 0 0
> > >>> 0 | 603 j = rec.sport.baseball
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 1 584 0 0 0 0 0 0 0 0
> > >>> 0 | 585 k = rec.sport.hockey
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 579 0 0 0 0 0 1 0
> > >>> 0 | 580 l = sci.crypt
> > >>> 0 0 0 1 3 0 2 0 0
> > >>> 2 0 0 567 1 2 1 0 0 0
> > >>> 0 | 579 m = sci.electronics
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 0 1 605 0 0 0 0 0
> > >>> 0 | 606 n = sci.med
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 0 0 0 602 0 0 0 0
> > >>> 0 | 602 o = sci.space
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 0 0 1 0 602 0 0 1
> > >>> 0 | 604 p = soc.religion.christian
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 0 0 0 0 0 556 0 0
> > >>> 0 | 556 q = talk.politics.mideast
> > >>> 0 0 1 0 0 0 0 0 0
> > >>> 0 0 1 0 0 1 0 0 568 0
> > >>> 0 | 571 r = talk.politics.guns
> > >>> 11 0 0 0 0 0 0 0 0
> > >>> 1 0 0 0 1 3 8 1 4 338
> > >>> 2 | 369 s = talk.religion.misc
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 1 0 0 0 1 0 3 4 0
> > >>> 447 | 456 t = talk.politics.misc
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa 0.9806
> > >>> Accuracy 98.9406%
> > >>> Reliability 94.0932%
> > >>> Reliability (standard deviation) 0.2163
> > >>>
> > >>> Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 15870 ms (Minutes: 0.2645)
> > >>> + echo 'Testing on holdout set'
> > >>> Testing on holdout set
> > >>> + ./bin/mahout testnb -i /tmp/mahout-work-ec2-user/20news-test-vectors
> > >>> -m /tmp/mahout-work-ec2-user/model -l /tmp/mahout-work-ec2-user/labelindex
> > >>> -ow -o /tmp/mahout-work-ec2-user/20news-testing -c
> > >>>
> > >>> [snip]
> > >>>
> > >>> INFO: Complementary Results:
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances : 6715 89.3071%
> > >>> Incorrectly Classified Instances : 804 10.6929%
> > >>> Total Classified Instances : 7519
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a b c d e f g h i
> > >>> j k l m n o p q r s
> > >>> t <--Classified as
> > >>> 298 0 0 0 0 0 0 0 0
> > >>> 1 0 0 0 1 2 5 1 0 13
> > >>> 0 | 321 a = alt.atheism
> > >>> 0 298 11 6 1 12 2 2 1
> > >>> 1 3 8 3 4 2 4 1 4 4
> > >>> 1 | 368 b = comp.graphics
> > >>> 1 17 286 16 4 9 6 3 2
> > >>> 0 1 0 1 7 1 0 2 1 0
> > >>> 1 | 358 c = comp.os.ms-windows.misc
> > >>> 2 6 11 309 9 5 14 8 1
> > >>> 0 2 0 6 4 2 0 1 2 1
> > >>> 0 | 383 d = comp.sys.ibm.pc.hardware
> > >>> 0 10 8 7 334 7 5 5 2
> > >>> 0 3 0 2 1 1 0 1 1 0
> > >>> 0 | 387 e = comp.sys.mac.hardware
> > >>> 1 13 7 8 2 355 2 0 2
> > >>> 0 0 5 1 1 3 0 0 1 0
> > >>> 0 | 401 f = comp.windows.x
> > >>> 0 7 11 29 12 9 268 16 8
> > >>> 4 3 2 6 4 2 1 3 1 2
> > >>> 3 | 391 g = misc.forsale
> > >>> 0 1 0 0 3 0 7 362 8
> > >>> 2 2 1 2 0 2 0 1 2 0
> > >>> 4 | 397 h = rec.autos
> > >>> 0 0 0 1 0 0 1 0 423
> > >>> 0 0 0 2 1 0 1 0 0 0
> > >>> 0 | 429 i = rec.motorcycles
> > >>> 0 0 1 0 0 0 0 2 2
> > >>> 371 8 0 2 3 0 2 0 0 0
> > >>> 0 | 391 j = rec.sport.baseball
> > >>> 0 0 1 0 0 0 1 0 0
> > >>> 2 409 0 0 0 0 0 0 0 0
> > >>> 1 | 414 k = rec.sport.hockey
> > >>> 0 0 1 2 1 0 1 0 0
> > >>> 0 0 404 0 0 0 0 0 1 0
> > >>> 1 | 411 l = sci.crypt
> > >>> 0 5 4 11 1 3 7 9 2
> > >>> 5 3 3 339 2 6 0 1 1 2
> > >>> 1 | 405 m = sci.electronics
> > >>> 0 4 0 1 0 0 0 1 0
> > >>> 1 1 0 3 367 3 1 2 0 0
> > >>> 0 | 384 n = sci.med
> > >>> 0 1 2 0 1 0 2 0 0
> > >>> 1 0 0 1 1 375 0 1 0 0
> > >>> 0 | 385 o = sci.space
> > >>> 4 2 1 1 0 0 1 1 2
> > >>> 0 0 1 1 5 1 367 4 0 1
> > >>> 1 | 393 p = soc.religion.christian
> > >>> 0 1 0 0 0 0 0 0 0
> > >>> 2 0 0 0 0 0 2 378 0 1
> > >>> 0 | 384 q = talk.politics.mideast
> > >>> 0 0 0 0 0 2 1 1 1
> > >>> 1 0 3 0 3 0 0 2 319 2
> > >>> 4 | 339 r = talk.politics.guns
> > >>> 32 0 0 1 0 0 0 0 0
> > >>> 1 1 1 0 2 2 26 5 7 175
> > >>> 6 | 259 s = talk.religion.misc
> > >>> 0 0 0 2 0 0 0 0 0
> > >>> 1 2 2 0 1 2 1 10 18 2
> > >>> 278 | 319 t = talk.politics.misc
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa 0.8594
> > >>> Accuracy 89.3071%
> > >>> Reliability 84.611%
> > >>> Reliability (standard deviation) 0.2148
> > >>>
> > >>> Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 10840 ms (Minutes: 0.18066666666666667)
> > >>>
> > >>>
> > >>> *Naive bayes:*
> > >>> INFO: Standard NB Results:
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances : 11286 99.0869%
> > >>> Incorrectly Classified Instances : 104 0.9131%
> > >>> Total Classified Instances : 11390
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a b c d e f g h i
> > >>> j k l m n o p q r s
> > >>> t <--Classified as
> > >>> 474 0 0 0 0 0 0 0 0
> > >>> 0 0 0 0 0 0 0 0 0 2
> > >>> 1 | 477 a = alt.atheism
> > >>> 0 566 0 2 0 1 0 0 0
> > >>> 0 0 0 0 0 0 0 0 0 0
> > >>> 0 | 569 b = comp.graphics
> > >>> 0 10 590 29 2 4 1 0 0
> > >>> 0 0 0 1 0 0 0 0 0 0
> > >>> 1 | 638 c = comp.os.ms-windows.misc
> > >>> 0 0 0 596 0 0 0 0 0
> > >>> 0 0 0 0 0 0 0 0 0 0
> > >>> 0 | 596 d = comp.sys.ibm.pc.hardware
> > >>> 0 0 0 0 575 0 1 0 0
> > >>> 0 0 0 1 0 0 0 0 0 0
> > >>> 0 | 577 e = comp.sys.mac.hardware
> > >>> 0 2 2 2 0 593 1 0 0
> > >>> 0 0 0 0 0 1 0 0 0 0
> > >>> 0 | 601 f = comp.windows.x
> > >>> 0 0 0 1 0 0 589 1 0
> > >>> 0 1 0 2 0 0 0 0 0 0
> > >>> 0 | 594 g = misc.forsale
> > >>> 0 0 0 0 0 0 0 594 0
> > >>> 0 0 0 0 0 0 0 0 0 0
> > >>> 0 | 594 h = rec.autos
> > >>> 0 0 0 0 0 0 0 0 611
> > >>> 0 0 0 0 0 0 0 0 0 0
> > >>> 0 | 611 i = rec.motorcycles
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 616 1 0 0 0 0 0 0 0 0
> > >>> 0 | 617 j = rec.sport.baseball
> > >>> 0 0 0 0 0 0 1 0 0
> > >>> 0 620 0 0 0 0 0 0 0 0
> > >>> 0 | 621 k = rec.sport.hockey
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 580 0 0 0 0 0 1 0
> > >>> 0 | 581 l = sci.crypt
> > >>> 0 0 0 3 1 0 0 0 0
> > >>> 0 0 0 571 0 0 0 0 0 0
> > >>> 0 | 575 m = sci.electronics
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 0 2 583 0 0 0 0 0
> > >>> 0 | 585 n = sci.med
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 0 0 1 599 0 0 0 0
> > >>> 0 | 600 o = sci.space
> > >>> 0 1 0 0 0 0 0 0 0
> > >>> 0 0 0 0 0 0 615 0 0 0
> > >>> 0 | 616 p = soc.religion.christian
> > >>> 1 0 0 0 0 0 0 0 0
> > >>> 0 0 0 0 0 0 1 560 0 0
> > >>> 0 | 562 q = talk.politics.mideast
> > >>> 0 0 1 0 0 0 0 0 0
> > >>> 0 0 1 0 0 0 0 0 548 0
> > >>> 1 | 551 r = talk.politics.guns
> > >>> 10 0 0 0 0 0 0 0 0
> > >>> 0 0 0 0 0 1 1 0 2 344
> > >>> 1 | 359 s = talk.religion.misc
> > >>> 0 0 0 0 0 0 0 0 0
> > >>> 0 0 1 1 0 0 0 0 2 0
> > >>> 462 | 466 t = talk.politics.misc
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa 0.9847
> > >>> Accuracy 99.0869%
> > >>> Reliability 94.3334%
> > >>> Reliability (standard deviation) 0.2169
> > >>>
> > >>> Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 14304 ms (Minutes: 0.2384)
> > >>> + echo 'Testing on holdout set'
> > >>> Testing on holdout set
> > >>>
> > >>> [snip]
> > >>>
> > >>> INFO: Standard NB Results:
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances : 6718 90.1019%
> > >>> Incorrectly Classified Instances : 738 9.8981%
> > >>> Total Classified Instances : 7456
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a b c d e f g h i
> > >>> j k l m n o p q r s
> > >>> t <--Classified as
> > >>> 294 0 0 0 0 0 0 0 0
> > >>> 0 0 2 0 1 1 6 1 1 16
> > >>> 0 | 322 a = alt.atheism
> > >>> 0 345 6 14 6 11 6 0 0
> > >>> 0 0 5 7 1 3 0 0 0 0
> > >>> 0 | 404 b = comp.graphics
> > >>> 2 29 177 78 22 19 9 1 0
> > >>> 0 0 4 2 0 1 1 0 0 1
> > >>> 1 | 347 c = comp.os.ms-windows.misc
> > >>> 1 9 2 335 18 2 10 0 0
> > >>> 0 1 0 8 0 0 0 0 0 0
> > >>> 0 | 386 d = comp.sys.ibm.pc.hardware
> > >>> 1 4 2 13 347 3 5 1 0
> > >>> 0 1 0 7 1 0 0 0 1 0
> > >>> 0 | 386 e = comp.sys.mac.hardware
> > >>> 0 20 0 4 0 352 4 0 0
> > >>> 0 0 0 1 1 3 0 1 0 1
> > >>> 0 | 387 f = comp.windows.x
> > >>> 0 2 0 21 5 1 323 7 2
> > >>> 2 0 2 12 0 3 0 0 0 0
> > >>> 1 | 381 g = misc.forsale
> > >>> 0 1 0 0 1 0 15 363 8
> > >>> 1 0 0 4 1 0 0 0 1 0
> > >>> 1 | 396 h = rec.autos
> > >>> 0 1 0 0 0 0 6 6 370
> > >>> 0 0 0 0 1 0 0 0 0 1
> > >>> 0 | 385 i = rec.motorcycles
> > >>> 1 0 0 1 1 0 2 1 2
> > >>> 362 5 0 2 0 0 0 0 0 0
> > >>> 0 | 377 j = rec.sport.baseball
> > >>> 0 0 0 1 2 0 0 0 0
> > >>> 3 371 0 0 0 0 0 0 0 0
> > >>> 1 | 378 k = rec.sport.hockey
> > >>> 0 3 1 0 1 0 2 0 0
> > >>> 0 0 396 0 1 0 0 1 1 1
> > >>> 3 | 410 l = sci.crypt
> > >>> 0 7 0 7 7 2 6 4 0
> > >>> 0 0 1 369 2 2 0 0 0 0
> > >>> 2 | 409 m = sci.electronics
> > >>> 0 3 0 2 1 0 2 0 0
> > >>> 0 0 1 4 383 4 0 0 1 0
> > >>> 4 | 405 n = sci.med
> > >>> 0 5 0 0 1 0 3 0 0
> > >>> 0 0 0 1 0 374 1 0 0 1
> > >>> 1 | 387 o = sci.space
> > >>> 6 2 0 1 1 0 0 1 0
> > >>> 1 0 0 1 5 0 352 2 1 7
> > >>> 1 | 381 p = soc.religion.christian
> > >>> 1 1 0 0 0 0 0 0 0
> > >>> 0 1 0 0 0 0 0 373 1 0
> > >>> 1 | 378 q = talk.politics.mideast
> > >>> 0 0 0 0 0 0 1 0 1
> > >>> 0 0 2 0 0 0 0 0 346 2
> > >>> 7 | 359 r = talk.politics.guns
> > >>> 26 1 0 1 0 0 0 2 0
> > >>> 1 1 0 0 1 1 20 2 6 200
> > >>> 7 | 269 s = talk.religion.misc
> > >>> 1 0 0 0 0 0 0 2 0
> > >>> 0 1 0 0 2 2 0 1 14 0
> > >>> 286 | 309 t = talk.politics.misc
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa 0.8726
> > >>> Accuracy 90.1019%
> > >>> Reliability 85.4491%
> > >>> Reliability (standard deviation) 0.2222
> > >>>
> > >>> Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 10878 ms (Minutes: 0.1813)
> > >>>
> > >>> *SGD:*
> > >>> 7532 test files
> > >>>
> > >>> =======================================================
> > >>> Summary
> > >>> -------------------------------------------------------
> > >>> Correctly Classified Instances : 5649 75%
> > >>> Incorrectly Classified Instances : 1883 25%
> > >>> Total Classified Instances : 7532
> > >>>
> > >>> =======================================================
> > >>> Confusion Matrix
> > >>> -------------------------------------------------------
> > >>> a b c d e f g h i
> > >>> j k l m n o p q r s
> > >>> t <--Classified as
> > >>> 186 6 3 10 5 0 33 4 13
> > >>> 15 7 1 24 15 3 15 5 5 29
> > >>> 15 | 394 a = sci.space
> > >>> 5 309 0 3 2 5 0 0 0
> > >>> 1 9 21 2 0 0 18 4 4 1
> > >>> 1 | 385 b = comp.sys.mac.hardware
> > >>> 4 1 101 3 0 1 63 0 7
> > >>> 0 1 1 5 16 3 0 3 7 1
> > >>> 34 | 251 c = talk.religion.misc
> > >>> 11 12 1 265 1 10 3 0 0
> > >>> 17 10 11 5 2 0 11 3 6 21
> > >>> 0 | 389 d = comp.graphics
> > >>> 2 1 1 0 349 2 3 0 3
> > >>> 2 6 1 5 1 0 2 15 2 1
> > >>> 2 | 398 e = rec.motorcycles
> > >>> 7 20 3 19 2 254 6 0 2
> > >>> 11 2 39 7 2 0 4 2 2 9
> > >>> 3 | 394 f = comp.os.ms-windows.misc
> > >>> 2 1 13 0 0 0 247 0 1
> > >>> 1 3 0 6 2 4 0 2 3 5
> > >>> 29 | 319 g = alt.atheism
> > >>> 1 1 0 0 2 0 2 361 0
> > >>> 1 2 0 2 0 0 1 3 22 0
> > >>> 1 | 399 h = rec.sport.hockey
> > >>> 3 0 3 1 0 0 5 0 161
> > >>> 0 1 2 12 102 0 0 1 2 11
> > >>> 6 | 310 i = talk.politics.misc
> > >>> 2 8 0 19 0 19 0 0 1
> > >>> 294 10 11 4 2 0 5 0 3 11
> > >>> 6 | 395 j = comp.windows.x
> > >>> 2 10 0 1 1 0 0 0 0
> > >>> 1 347 13 2 1 0 5 3 2 2
> > >>> 0 | 390 k = misc.forsale
> > >>> 1 36 0 6 1 25 0 0 1
> > >>> 6 10 257 2 1 0 34 6 0 6
> > >>> 0 | 392 l = comp.sys.ibm.pc.hardware
> > >>> 2 2 2 2 1 0 12 0 0
> > >>> 6 10 4 312 5 2 13 11 3 3
> > >>> 6 | 396 m = sci.med
> > >>> 2 0 3 2 1 0 0 1 13
> > >>> 0 5 1 2 314 2 0 2 2 10
> > >>> 4 | 364 n = talk.politics.guns
> > >>> 1 0 2 1 1 0 34 1 33
> > >>> 1 3 0 1 8 271 1 4 5 6
> > >>> 3 | 376 o = talk.politics.mideast
> > >>> 3 14 0 8 2 8 3 1 1
> > >>> 7 12 29 6 2 1 245 13 2 32
> > >>> 4 | 393 p = sci.electronics
> > >>> 3 3 0 2 11 0 1 0 2
> > >>> 1 11 6 4 2 0 11 330 4 4
> > >>> 1 | 396 q = rec.autos
> > >>> 0 0 1 0 1 0 4 12 3
> > >>> 1 3 0 0 0 0 5 6 359 1
> > >>> 1 | 397 r = rec.sport.baseball
> > >>> 0 1 0 0 0 1 0 0 3
> > >>> 3 0 0 3 2 1 6 1 6 366
> > >>> 3 | 396 s = sci.crypt
> > >>> 0 2 11 1 1 0 40 0 1
> > >>> 2 3 4 2 1 0 5 0 2 2
> > >>> 321 | 398 t = soc.religion.christian
> > >>>
> > >>> =======================================================
> > >>> Statistics
> > >>> -------------------------------------------------------
> > >>> Kappa 0.7073
> > >>> Accuracy 75%
> > >>> Reliability 70.6238%
> > >>> Reliability (standard deviation) 0.2187
> > >>> Log-likelihood mean : -1.1182
> > >>> 25%-ile : -1.6911
> > >>> 75%-ile : -0.0803
> > >>>
> > >>> Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>> INFO: Program took 10783 ms (Minutes: 0.17971666666666666)
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <su...@yahoo.com>wrote:
> > >>>
> > >>>> Thanks Andrew for reporting that. I rolled back the release to fix this
> > >>>> and few other issues.
> > >>>>
> > >>>> We have removed asf-examples*.sh from trunk as the sample file at the
> > >>>> url mentioned in ur email is not available.
> > >>>> This is something we need to fix and restore in 1.0.
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <
> > >>>> ap.dev@outlook.com> wrote:
> > >>>>
> > >>>> from the asf-email-examples.sh script:
> > >>>>
> > >>>> # You will need to download or otherwise obtain some or all of the
> > >>>> Amazon ASF Em
> > >>>> ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566)
> > >>>> to use this
> > >>>> script.
> > >>>> # To obtain a full copy you will need to launch an EC2 instance and
> > >>>> mount the da
> > >>>> taset to download it, otherwise you can get a sample of it at
> > >>>> #
> > >>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
> > >>>>
> > >>>> It looks like the:
> > >>>>
> > >>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
> > >>>>
> > >>>> link is down.
> > >>>>
> > >>>> Is there somewhere else that we can get a subset of the ASF emails?
> > >>>>
> > >>>>
> > >>>>
> > >>>> Date: Tue, 21 Jan 2014 09:48:06 -0800
> > >>>> > Subject: Re: MAHOUT 0.9 Release - New URL
> > >>>> > From: andrew.musselman@gmail.com
> > >>>> > To: dev@mahout.apache.org
> > >>>> >
> > >>>> > Sure thing; continuing to smoke test the other examples tonight
> > >>>> >
> > >>>> >
> > >>>> > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <
> > >>>> suneel_marthi@yahoo.com>wrote:
> > >>>> >
> > >>>> > > Thanks Andrew M., see that some of the example scripts need to be
> > >>>> fixed as
> > >>>> > > they still refer to the deprecated algorithms.
> > >>>> > > See that the Streaming KMeans has failed for you as well.
> > >>>> > >
> > >>>> > > I'll be rolling back the release today to fix these issues.
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
> > >>>> > > andrew.musselman@gmail.com> wrote:
> > >>>> > >
> > >>>> > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default
> > >>>> 64-bit
> > >>>> > > Linux AMI from tarball.
> > >>>> > >
> > >>>> > > All tests pass.
> > >>>> > >
> > >>>> > > *Output of examples:*
> > >>>> > > *asf-email-examples.sh, run on mahout.apache.org
> > >>>> > > <http://mahout.apache.org>:*
> > >>>> > > *recommendations:*
> > >>>> > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
> > >>>> > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000 | less
> > >>>> > > 1
> > >>>> > >
> > >>>> > >
> > >>>> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
> > >>>> > > 4
> > >>>> > >
> > >>>> > >
> > >>>> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
> > >>>> > > 6
> > >>>> > >
> > >>>> > >
> > >>>> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
> > >>>> > > 8
> > >>>> > > [12758:1.0,19409:1.0,11112:1.0]
> > >>>> > > 11
> > >>>> > >
> > >>>> > >
> > >>>> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
> > >>>> > > 14
> > >>>> > >
> > >>>> > >
> > >>>> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
> > >>>> > > 15
> > >>>> > >
> > >>>> > >
> > >>>> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
> > >>>> > > 16
> > >>>> > >
> > >>>> > >
> > >>>> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
> > >>>> > > 18
> > >>>> > >
> > >>>> > >
> > >>>> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
> > >>>> > > 19 [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
> > >>>> > > 20
> > >>>> > >
> > >>>> > >
> > >>>> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
> > >>>> > > [snip]
> > >>>> > >
> > >>>> > > *clustering; kmeans:*
> > >>>> > > [snip]
> > >>>> > > Weight : [props - optional]: Point:
> > >>>> > > 1.0 :
> > >>>> > > [distance-squared=1.0193102046188427]:
> > >>>> > > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus=
> > >>>> > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
> > >>>> 7573:0.204,
> > >>>> > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093,
> > >>>> 9779:0.159,
> > >>>> > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
> > >>>> > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
> > >>>> > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
> > >>>> > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
> > >>>> > > 39789:0.110, 40743:0.190, 45775:0.086]
> > >>>> > > 1.0 : [distance-squared=0.9823018320457279]:
> > >>>> > > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus=
> > >>>> > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
> > >>>> 5336:0.106,
> > >>>> > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173,
> > >>>> 7832:0.072,
> > >>>> > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
> > >>>> > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
> > >>>> > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
> > >>>> > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
> > >>>> > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
> > >>>> > > 1.0 : [distance-squared=0.9509142993214911]:
> > >>>> > >
> > >>>> /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
> > >>>> > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
> > >>>> > > 4419:0.076,
> > >>>> > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056,
> > >>>> 7235:0.048,
> > >>>> > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123,
> > >>>> 7683:0.077,
> > >>>> > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
> > >>>> 10225:0.081,
> > >>>> > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
> > >>>> > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
> > >>>> > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
> > >>>> > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
> > >>>> > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
> > >>>> > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
> > >>>> > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
> > >>>> > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
> > >>>> > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
> > >>>> > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
> > >>>> > > 41280:0.065, 41696:0.072, 41947:0.118,
> > >>>> > > 43685:0.086, 44077:0.308,
> > >>>> > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
> > >>>> > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
> > >>>> > > [snip]
> > >>>> > >
> > >>>> > > *clustering; dirichlet:*
> > >>>> > > Get this complaint:
> > >>>> > > Running Dirichlet with K = 8
> > >>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > >>>> > > HADOOP_CONF_DIR=
> > >>>> > > MAHOUT-JOB:
> > >>>> > >
> > >>>> > >
> > >>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > >>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class:
> > >>>> dirichlet
> > >>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props
> > >>>> found on
> > >>>> > > classpath, will use command-line arguments only
> > >>>> > > Unknown program 'dirichlet' chosen.
> > >>>> > >
> > >>>> > > *clustering: minhash:*
> > >>>> > > Running Minhash
> > >>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > >>>> > > HADOOP_CONF_DIR=
> > >>>> > > MAHOUT-JOB:
> > >>>> > >
> > >>>> > >
> > >>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > >>>> > > 14/01/21 05:17:27 WARN
> > >>>> > > driver.MahoutDriver: Unable to add class: minhash
> > >>>> > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found
> > >>>> on
> > >>>> > > classpath, will use command-line arguments only
> > >>>> > > Unknown program 'minhash' chosen.
> > >>>> > >
> > >>>> > > *classification; standard:*
> > >>>> > > =======================================================
> > >>>> > > Summary
> > >>>> > > -------------------------------------------------------
> > >>>> > > Correctly Classified Instances : 5384 87.7874%
> > >>>> > > Incorrectly Classified Instances : 749 12.2126%
> > >>>> > > Total Classified Instances : 6133
> > >>>> > >
> > >>>> > > =======================================================
> > >>>> > > Confusion Matrix
> > >>>> > > -------------------------------------------------------
> > >>>> > > a b c d
> > >>>> > > <--Classified as
> > >>>> > > 2949 7 531 25 | 3512 a = dev
> > >>>> > > 0 0 0 0 | 0 b = general
> > >>>> > > 99 8 1763 8 | 1878 c = user
> > >>>> > > 41 1 29 672 | 743 d = commits
> > >>>> > >
> > >>>> > > =======================================================
> > >>>> > > Statistics
> > >>>> > > -------------------------------------------------------
> > >>>> > > Kappa
> > >>>> > > 0.7877
> > >>>> > > Accuracy 87.7874%
> > >>>> > > Reliability 53.658%
> > >>>> > > Reliability (standard deviation) 0.4911
> > >>>> > >
> > >>>> > > *classification; complementary:*
> > >>>> > > =======================================================
> > >>>> > > Summary
> > >>>> > > -------------------------------------------------------
> > >>>> > > Correctly Classified Instances : 5530 90.1679%
> > >>>> > > Incorrectly Classified Instances : 603 9.8321%
> > >>>> > > Total Classified Instances :
> > >>>> > > 6133
> > >>>> > >
> > >>>> > > =======================================================
> > >>>> > > Confusion Matrix
> > >>>> > > -------------------------------------------------------
> > >>>> > > a b c d <--Classified as
> > >>>> > > 3168 0 276 68 | 3512 a = dev
> > >>>> > > 0 0 0 0 | 0 b = general
> > >>>> > > 196 0 1652 30 | 1878 c = user
> > >>>> > > 25 0 8 710 | 743 d =
> > >>>> > > commits
> > >>>> > >
> > >>>> > > =======================================================
> > >>>> > > Statistics
> > >>>> > > -------------------------------------------------------
> > >>>> > > Kappa 0.8259
> > >>>> > > Accuracy 90.1679%
> > >>>> > > Reliability 54.7459%
> > >>>> > > Reliability (standard deviation) 0.5005
> > >>>> > >
> > >>>> > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms
> > >>>> (Minutes:
> > >>>> > > 0.34836666666666666)
> > >>>> > >
> > >>>> > > *classification; sgd, with three categories:*
> > >>>> > > Running SGD Training
> > >>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
> > >>>> > > and
> > >>>> > > HADOOP_CONF_DIR=
> > >>>> > > MAHOUT-JOB:
> > >>>> > >
> > >>>> > >
> > >>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > >>>> > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
> > >>>> > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
> > >>>> classpath,
> > >>>> > > will use command-line arguments only
> > >>>> > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
> > >>>> > > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
> > >>>> > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
> > >>>> > > --output=[asf-output/classification/sgd/models], --poolSize=[5],
> > >>>> > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
> > >>>> > > 24168 training files
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 1
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> > >>>> > > 2
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 3
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 4
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 6
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 8
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 10
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00
> > >>>> > > 0.00 0.00 0.0000000 0.0000000 12
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 15
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 20
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 25
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 30
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000
> > >>>> > > 0.0000000 40
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 50
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 60
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 70
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 80
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 100
> > >>>> > > 0.000
> > >>>> > > 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 120
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 140
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 150
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 200
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 250
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00
> > >>>> > > 0.00 0.00 0.0000000 0.0000000 300
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 400
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 500
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 600
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 700
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.00 0.00 0.00 0.00 0.0000000
> > >>>> > > 0.0000000 800
> > >>>> > > 0.000 0.00 none
> > >>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > >>>> > > 1.0019413e-08 1000 -0.607 75.78 none
> > >>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > >>>> > > 1.0019413e-08 1200 -0.607 75.78 none
> > >>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > >>>> > > 1.0019413e-08 1400 -0.607 75.78 none
> > >>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > >>>> > > 1.0019413e-08 1500 -0.607 75.78 none
> > >>>> > > 0.24 43686.00 17924.00 329.50
> > >>>> > > 1.0571799e-08
> > >>>> > > 1.0032261e-08 2000 -0.487 82.65 none
> > >>>> > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
> > >>>> > > 1.0011902e-08 2500 -0.439 83.90 none
> > >>>> > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
> > >>>> > > 1.0011902e-08 3000 -0.439 83.90 none
> > >>>> > > 0.32 50635.00 28531.00 437.09 1.0551175e-08
> > >>>> > > 1.0000001e-08 4000 -0.351 88.14 none
> > >>>> > > 0.32 50635.00 32642.00 437.09 1.0551175e-08
> > >>>> > > 1.0000000e-08 5000 -0.378 87.10 none
> > >>>> > > 0.32 50635.00 36461.00 437.09
> > >>>> > > 1.0556652e-08
> > >>>> > > 1.0000001e-08 6000 -0.372 86.89 none
> > >>>> > > 0.32 50635.00 37768.00 437.09 1.0576742e-08
> > >>>> > > 1.0000001e-08 7000 -0.334 89.26 none
> > >>>> > > 0.32 50635.00 38807.00 437.09 1.0576742e-08
> > >>>> > > 1.0000000e-08 8000 -0.368 87.52 none
> > >>>> > > 0.32 50635.00 44731.00 437.09 1.0576716e-08
> > >>>> > > 1.0000000e-08 10000 -0.374 87.39 none
> > >>>> > > 0.32 50635.00 45672.00 437.09 1.0576716e-08
> > >>>> > > 1.0000000e-08 12000 -0.298 88.26 none
> > >>>> > > Exception in thread "main" java.lang.IllegalStateException:
> > >>>> > > java.lang.ArrayIndexOutOfBoundsException:
> > >>>> > > 2
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
> > >>>> > > at
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
> > >>>> > > at
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
> > >>>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > >>>> Method)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > >>>> > >
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > >>>> > > at java.lang.reflect.Method.invoke(Method.java:622)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > >>>> > > at
> > >>>> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > >>>> > > at
> > >>>> > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> > >>>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > >>>> Method)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > >>>> > > at java.lang.reflect.Method.invoke(Method.java:622)
> > >>>> > > at
> > >>>> > > org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> > >>>> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
> > >>>> > > at
> > >>>> > > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
> > >>>> > >
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
> > >>>> > > at
> > >>>> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > >>>> > > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > >>>> > > at
> > >>>> > >
> > >>>> > >
> > >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >>>> > > at java.lang.Thread.run(Thread.java:701)
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > >
> > >>>> > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
> > >>>> > > andrew.musselman@gmail.com> wrote:
> > >>>> > >
> > >>>> > > > Trying out the build today
> > >>>> > > >
> > >>>> > > >
> > >>>> > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
> > >>>> suneel_marthi@yahoo.com
> > >>>> > > >wrote:
> > >>>> > > >
> > >>>> > > >> This is an issue (trivial one though) that needs to be fixed for
> > >>>> 0.9
> > >>>> > > >> Release, will be rerolling the release today (in the next few
> > >>>> hrs) and
> > >>>> > > >> putting out a new release candidate in staging.
> > >>>> > > >>
> > >>>> > > >> Thanks for reporting this Andrew P.
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
> > >>>> > > ap.dev@outlook.com>
> > >>>> > > >> wrote:
> > >>>> > > >>
> > >>>> > > >> I ran through the tests with on a CentOS VM
> > >>>> > > AMD64 2 cores 4 GB RAM. Had
> > >>>> > > >> a bit of trouble getting the Hadoop natives to compile and
> > >>>> therefore may
> > >>>> > > >> have run into some problems because of the hadoop setup. Ran
> > >>>> into some
> > >>>> > > >> problems in the example scripts. Particularly with
> > >>>> > > >> ./cluster-syntheticcontrol.sh ->4,5. I will run through the
> > >>>> rest of the
> > >>>> > > >> examples when im sure I've got hadoop setup right.
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> Apache Maven 3.1.2-SNAPSHOT
> > >>>> > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> > >>>> > > >> Java home: /usr/java/jdk1.6.0_45/jre
> > >>>> > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch:
> > >>>> "amd64",
> > >>>> > > >> family: "unix"
> > >>>> > > >> $MAHOUT_LOCAL=true
> > >>>> > > >> Hadoop 2.2.0
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> a) Verify that u can unpack the release (tar or zip) ...passed
> > >>>> (tar)
> > >>>> > > >> [passed ]
> > >>>> > > >>
> > >>>> > > >> b) Verify u r able to compile the
> > >>>> > > distro
> > >>>> > > >>
> > >>>> > > >> mvn compile- [passed with warnings]
> > >>>> > > >>
> > >>>> > > >> [WARNING] Expected all dependencies to require Scala
> > >>>> version: 2.9.3
> > >>>> > > >> [WARNING] org.apache.mahout:mahout-math-scala:0.9 requires
> > >>>> scala
> > >>>> > > >> version: 2.9.3
> > >>>> > > >> [WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala
> > >>>> > > >> version: 2.9.2
> > >>>> > > >> [WARNING] Multiple versions of scala libraries detected!
> > >>>> > > >>
> > >>>> > > >> c) Run through the unit tests: mvn clean test
> > >>>> > > >> mvn clean test [passed]
> > >>>> > > >>
> > >>>> > > >> d) Run the
> > >>>> > > >> example scripts under $MAHOUT_HOME/examples/bin.
> > >>>> > > >> Please run through all the different options in each script
> > >>>> > > >>
> > >>>> > > >> Running example scripts with $MAHOUT_LOCAL=true
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > ./cluster-syntheticcontrol.sh ->1 [works]
> > >>>> > > >> ./cluster-syntheticcontrol.sh ->2 [works]
> > >>>> > > >> ./cluster-syntheticcontrol.sh ->3 [works]
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
> > >>>> > > >> [...]
> > >>>> > > >> WARNING: Unable to add class:
> > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > >>>> > > >> java.lang.ClassNotFoundException:
> > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > >>>> > > >> at
> > >>>> > > >> java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > >>>> > > >> at java.security.AccessController.doPrivileged(Native
> > >>>> Method)
> > >>>> > > >> at
> > >>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > >>>> > > >> at
> > >>>> > > java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > >>>> > > >> at
> > >>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > >>>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > >>>> > > >> at java.lang.Class.forName0(Native Method)
> > >>>> > > >> at java.lang.Class.forName(Class.java:171)
> > >>>> > > >> at
> > >>>> > > >>
> > >>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > >>>> > > >> at
> > >>>> > > >>
> > >>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > >>>> > > >> Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
> > >>>> > > >>
> > >>>> > > >> WARNING: Unable to add class:
> > >>>> > > >>
> > >>>> > > org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > >>>> > > >> java.lang.ClassNotFoundException:
> > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > >>>> > > >> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > >>>> > > >> at java.security.AccessController.doPrivileged(Native
> > >>>> Method)
> > >>>> > > >> at
> > >>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > >>>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > >>>> > > >> at
> > >>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > >>>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > >>>> > > >> at java.lang.Class.forName0(Native Method)
> > >>>> > > >> at
> > >>>> > > java.lang.Class.forName(Class.java:171)
> > >>>> > > >> at
> > >>>> > > >>
> > >>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > >>>> > > >> at
> > >>>> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > >>>> > > >> Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
> > >>>> > > >> WARNING: No
> > >>>> > > >>
> > >>>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found
> > >>>> > > on
> > >>>> > > >> classpath, will use command-line arguments only
> > >>>> > > >> Unknown program
> > >>>> > > >> 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
> > >>>> chosen.
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> ./classify-20newsgroups.sh ->1 [works]
> > >>>> > > >> ./classify-20newsgroups.sh ->2 [works]
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> cluster-reuters.sh ->1 [works]
> > >>>> > > >>
> > >>>> > > cluster-reuters.sh ->2 [works]
> > >>>> > > >> cluster-reuters.sh ->3 [works]
> > >>>> > > >>
> > >>>> > > >> Same error as noted previosly in the thread:
> > >>>> > > >>
> > >>>> > > >> cluster-reuters.sh ->4 [0 clusters]
> > >>>> > > >>
> > >>>> > > >> [...]
> > >>>> > > >>
> > >>>> > > >> WARNING: No qualcluster.props found on classpath, will use
> > >>>> > > >> command-line arguments only
> > >>>> > > >> Num clusters: 0; maxDistance: 0.000000
> > >>>> > > >> [Dunn Index]
> > >>>> > > >> First: Infinity
> > >>>> > > >> [Davies-Bouldin Index] First: NaN
> > >>>> > > >> Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
> > >>>> > > >> INFO: Program took 669 ms (Minutes: 0.01115)
> > >>>> > > >> cluster,distance.mean,distance.sd
> > >>>> > > >>
> > >>>> > >
> > >>>> > >
> > >>>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
> > >>>> > > >> > From: suneel_marthi@yahoo.com
> > >>>> > > >> > Subject: MAHOUT 0.9 Release - New URL
> > >>>> > > >> > To: user@mahout.apache.org; dev@mahout.apache.org
> > >>>> > > >> >
> > >>>> > > >> > Third time's a Charm!!!
> > >>>> > > >> >
> > >>>> > > >> >
> > >>>> > > >> > Here's the new URL for Mahout 0.9 Release:
> > >>>> > > >> >
> > >>>> > > >>
> > >>>> > >
> > >>>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> > >>>> > > >> >
> > >>>> > > >> > For those volunteering to test this, some of the things to be
> > >>>> > > verified:
> > >>>> > > >> >
> > >>>> > > >> > a) Verify that u can unpack the release (tar or zip)
> > >>>> > > >> > b) Verify u r able to compile the distro
> > >>>> > > >> > c) Run through the unit tests: mvn clean test
> > >>>> > > >> > d) Run the example scripts
> > >>>> > > >> under $MAHOUT_HOME/examples/bin. Please run through all the
> > >>>> different
> > >>>> > > >> options in each script.
> > >>>> > > >> >
> > >>>> > > >> >
> > >>>> > > >> > Committers
> > >>>> > > >> > and PMC members:
> > >>>> > > >> > ---------------------------------------
> > >>>> > > >> >
> > >>>> > > >> > Need 'at least 3 +1 votes' for the Release to pass.
> > >>>> > > >> >
> > >>>> > > >> >
> > >>>> > > >> > Thanks and
> > >>>> > > Regards.
> > >>>> > > >>
> > >>>> > > >
> > >>>> > > >
> > >>>> > >
> > >>>>
> > >>>
> > >>>
> > >>
> > >
>
RE: MAHOUT 0.9 Release - New URL
Posted by Andrew Palumbo <ap...@outlook.com>.
will do!
> Date: Wed, 22 Jan 2014 01:24:05 -0800
> From: suneel_marthi@yahoo.com
> Subject: Re: MAHOUT 0.9 Release - New URL
> To: dev@mahout.apache.org; user@mahout.apache.org
>
> Andrew M., Andrew P. and others,
>
> Sebastian and me fixed a few issues today (for 0.9):
>
> a) Removed asf-email-examples.sh script and few other scripts that should have been removed. Also removed references/invocations to algorithms that have been removed from the codebase.
> b) Fixed the issue with Streaming Kmeans clustering and checked in the code.
> c) Resurrected Frequent Pattern Mining implementation for 0.9.
>
> Please checkout the latest code from trunk, run a build locally and run thru the example scripts.
>
> Thanks and Regards.
>
>
>
>
>
>
> On Wednesday, January 22, 2014 12:11 AM, Andrew Musselman <an...@gmail.com> wrote:
>
> *factorize-movielens-1M.sh:*
> RMSE is:
>
> 0.8519064098265133
>
>
> Sample recommendations:
>
> 2229
> [2197:4.4961276,527:4.4309845,2972:4.4180074,318:4.379484,572:4.312696,3092:4.249903]
> 5848
> [1900:4.6775646,3787:4.6623707,632:4.641377,2609:4.608225,3808:4.6058936,2998:4.6057487]
> 3728
> [572:4.951382,47:4.748921,874:4.6945343,1563:4.679901,3314:4.6621537,50:4.655838]
> 1252
> [53:5.0,3816:4.9664702,3077:4.9494777,213:4.94007,3808:4.9060082,978:4.8568053]
> 634
> [572:5.0,3092:4.779557,1872:4.72024,2687:4.629712,2125:4.615142,3853:4.5261393]
> 5516 [572:5.0,2197:5.0,3092:5.0,318:4.908213,356:4.885,3844:4.8237453]
> 2276 [1204:5.0,572:5.0,912:5.0,1250:5.0,1272:4.999891,1262:4.989652]
> 4219
> [53:4.8112006,598:4.775032,858:4.761604,572:4.7579737,1219:4.680987,1221:4.6604886]
> 91 [1198:5.0,2762:5.0,1207:5.0,1234:5.0,318:5.0,260:5.0]
> 502
> [953:5.0,260:4.9800477,1234:4.869403,1198:4.8527064,1207:4.8497486,3469:4.847286]
>
> factorize-netflix.sh:
> References a no-longer-available data set that Netflix took down after the
> competition; should at least mention that the data set is no longer
> "online" at least.
>
>
> On Tue, Jan 21, 2014 at 8:05 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
> > *clustering-syntheticcontrol.sh*
> >
> > *Canopy:*
> > [snip]
> > 1.0 : [distance-squared=1740.681000315628]: [35.486, 25.600,
> > 29.914, 30.200, 27.654, 28.647, 26.582, 32.941, 34.728, 29.047, 34.706,
> > 34.764, 28.816, 30.271, 25.784, 25.035, 35.436, 29.075, 24.267, 24.625,
> > 28.162, 26.218, 28.773, 28.966, 28.802, 34.292, 26.211, 33.363, 32.920,
> > 31.891, 34.504, 32.686, 24.327, 35.981, 31.390, 10.832, 20.238, 10.051,
> > 14.877, 10.570, 19.603, 14.544, 10.667, 16.470, 19.007, 10.352, 13.473,
> > 12.196, 10.684, 16.620, 20.434, 17.069, 18.744, 9.599, 11.195, 12.002,
> > 10.017, 17.149, 14.850, 10.890]
> > 1.0 : [distance-squared=1455.363773097357]: [31.022, 28.140,
> > 26.730, 26.570, 29.561, 26.966, 28.049, 25.673, 33.721, 26.275, 30.410,
> > 31.101, 24.019, 35.659, 25.253, 25.932, 28.618, 32.423, 33.666, 33.745,
> > 35.118, 29.164, 25.477, 31.947, 35.491, 30.730, 25.820, 24.651, 25.528,
> > 31.343, 29.005, 31.825, 26.891, 28.194, 31.429, 16.935, 8.070, 16.604,
> > 14.743, 10.342, 8.155, 10.395, 17.689, 16.791, 14.138, 15.761, 6.787,
> > 13.062, 16.660, 15.021, 9.891, 9.216, 11.550, 8.877, 18.220, 9.477, 10.342,
> > 16.430, 11.898, 15.366]
> > 1.0 : [distance-squared=1679.9304895378882]: [29.625, 25.503,
> > 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> > 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> > 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> > 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> > 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> > 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> > 15.285, 22.528, 20.657, 24.129]
> > 1.0 : [distance-squared=2044.2887801683828]: [27.414, 25.397,
> > 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> > 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> > 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> > 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> > 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> > 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> > 20.229, 11.131, 9.980, 10.720]
> > 1.0 : [distance-squared=1385.3154063160764]: [35.899, 26.672,
> > 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> > 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> > 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> > 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> > 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> > 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> > 16.546, 15.927, 18.084, 17.475]
> > 1.0 : [distance-squared=1920.6376615603585]: [24.538, 24.280,
> > 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> > 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> > 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> > 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> > 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> > 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> > 19.310, 12.999, 17.460]
> > 1.0 : [distance-squared=2192.939571172661]: [34.335, 30.938,
> > 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> > 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> > 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> > 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> > 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> > 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> > 11.743, 11.699, 10.152]
> > Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: Wrote 6 clusters
> > Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: Program took 5827 ms (Minutes: 0.09711666666666667)
> >
> > *K-means:*
> > [snip]
> > 1.0 : [distance-squared=2873.881301031739]: [26.369, 37.791,
> > 41.839, 39.694, 36.728, 35.079, 30.668, 24.755, 20.610, 18.885, 15.459,
> > 14.074, 19.117, 34.230, 32.202, 40.715, 39.543, 37.193, 39.448, 30.829,
> > 25.273, 22.324, 19.801, 13.505, 17.462, 24.287, 27.398, 38.577, 42.108,
> > 41.947, 43.987, 41.331, 28.958, 23.664, 20.308, 22.244, 12.149, 15.768,
> > 18.964, 32.579, 33.202, 36.205, 42.364, 40.601, 35.415, 39.576, 33.145,
> > 19.936, 23.062, 19.053, 24.383, 19.611, 25.218, 38.860, 36.570, 38.964,
> > 36.946, 36.900, 32.593, 31.563]
> > 1.0 : [distance-squared=2525.8924241648783]: [35.389, 31.178,
> > 40.041, 43.034, 49.524, 40.942, 42.369, 30.153, 26.253, 23.178, 19.885,
> > 19.693, 21.837, 26.858, 23.533, 29.798, 43.401, 43.349, 48.238, 43.868,
> > 44.017, 28.056, 25.284, 22.737, 13.703, 14.023, 22.115, 23.720, 25.478,
> > 37.870, 37.868, 46.954, 43.135, 41.286, 37.269, 29.347, 24.312, 21.743,
> > 19.152, 21.668, 10.882, 16.147, 30.020, 28.472, 38.716, 44.620, 47.847,
> > 48.846, 39.361, 38.449, 33.105, 19.935, 14.961, 12.149, 12.630, 13.459,
> > 18.967, 35.473, 30.146, 45.527]
> > 1.0 : [distance-squared=2392.7171990886272]: [27.662, 37.199,
> > 39.158, 44.264, 46.473, 40.178, 38.728, 24.412, 25.155, 15.938, 13.125,
> > 16.867, 13.875, 29.969, 34.322, 40.870, 44.225, 47.594, 47.607, 44.751,
> > 39.813, 32.461, 16.646, 16.305, 21.256, 20.627, 19.267, 31.901, 34.995,
> > 43.643, 43.152, 47.125, 48.524, 33.131, 32.697, 20.300, 14.350, 18.330,
> > 12.155, 11.261, 19.701, 29.313, 30.457, 39.914, 49.543, 49.851, 46.445,
> > 39.561, 28.860, 22.486, 13.967, 10.006, 11.715, 15.945, 21.348, 25.439,
> > 41.024, 37.105, 45.623, 43.589]
> > 1.0 : [distance-squared=1419.8378244373016]: [25.784, 34.129,
> > 42.659, 37.176, 35.961, 34.307, 32.108, 29.749, 25.047, 17.455, 24.652,
> > 25.311, 22.995, 30.256, 25.955, 28.426, 34.556, 40.386, 39.642, 40.566,
> > 32.612, 34.091, 26.033, 25.668, 26.545, 17.338, 24.980, 20.134, 27.542,
> > 36.612, 31.855, 37.947, 39.736, 33.535, 36.607, 37.479, 32.612, 22.262,
> > 20.662, 16.124, 24.547, 27.686, 21.747, 27.198, 31.259, 40.569, 37.067,
> > 34.465, 34.730, 33.371, 23.060, 30.162, 22.022, 22.216, 14.812, 19.357,
> > 24.508, 34.432, 32.155, 34.839]
> > 1.0 : [distance-squared=4186.814512311335]: [25.870, 39.195,
> > 36.908, 47.052, 47.384, 40.741, 42.494, 30.282, 25.834, 17.650, 16.004,
> > 17.895, 13.321, 19.045, 27.440, 31.911, 39.208, 43.622, 41.567, 44.815,
> > 44.921, 35.422, 35.477, 23.190, 17.859, 14.684, 23.504, 23.141, 21.746,
> > 30.816, 31.361, 37.015, 38.094, 46.688, 47.681, 43.777, 39.652, 31.701,
> > 23.767, 22.265, 22.654, 22.327, 19.195, 21.163, 29.602, 27.563, 36.244,
> > 38.859, 44.234, 42.352, 42.160, 40.172, 30.094, 21.092, 25.193, 13.096,
> > 18.111, 14.754, 27.386, 27.026]
> > 1.0 : [distance-squared=1544.4011543572997]: [28.075, 41.784,
> > 42.120, 38.735, 44.320, 34.316, 32.212, 31.868, 24.301, 14.547, 17.178,
> > 22.279, 24.357, 31.011, 31.444, 34.837, 46.550, 48.301, 38.859, 42.363,
> > 35.657, 31.499, 19.794, 12.124, 15.371, 12.436, 15.763, 24.679, 32.597,
> > 43.004, 36.616, 38.935, 42.954, 34.957, 36.183, 28.177, 16.326, 14.988,
> > 10.680, 22.728, 24.075, 24.058, 36.616, 43.982, 39.198, 40.118, 40.078,
> > 34.752, 34.018, 23.750, 18.374, 12.251, 15.539, 18.699, 28.973, 24.044,
> > 39.404, 38.034, 46.458, 44.432]
> > 1.0 : [distance-squared=825.9338725427806]: [33.670, 38.675,
> > 39.742, 41.989, 37.291, 43.975, 31.909, 25.878, 31.080, 15.858, 13.950,
> > 23.097, 19.983, 21.692, 31.579, 38.570, 33.376, 38.843, 41.936, 33.534,
> > 39.195, 32.897, 25.343, 18.523, 15.089, 17.771, 22.614, 25.313, 23.687,
> > 29.010, 41.995, 35.712, 40.872, 41.669, 32.156, 25.162, 24.980, 23.705,
> > 18.413, 20.975, 14.906, 26.171, 30.165, 27.818, 35.083, 39.514, 37.851,
> > 33.967, 32.338, 34.977, 26.589, 28.079, 19.597, 24.669, 23.098, 25.685,
> > 28.215, 34.940, 36.910, 39.749]
> > Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: Wrote 6 clusters
> > Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: Program took 16902 ms (Minutes: 0.2817)
> >
> > *Fuzzy k-means:*
> > [snip]
> > 1.0 : [distance-squared=971.7369782121968]: [29.625, 25.503,
> > 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> > 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> > 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> > 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> > 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> > 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> > 15.285, 22.528, 20.657, 24.129]
> > 1.0 : [distance-squared=2054.618163154475]: [27.414, 25.397,
> > 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> > 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> > 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> > 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> > 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> > 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> > 20.229, 11.131, 9.980, 10.720]
> > 1.0 : [distance-squared=954.6503560728597]: [35.899, 26.672,
> > 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> > 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> > 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> > 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> > 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> > 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> > 16.546, 15.927, 18.084, 17.475]
> > 1.0 : [distance-squared=2817.9170498632957]: [24.538, 24.280,
> > 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> > 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> > 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> > 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> > 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> > 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> > 19.310, 12.999, 17.460]
> > 1.0 : [distance-squared=3472.3684696871424]: [34.335, 30.938,
> > 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> > 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> > 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> > 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> > 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> > 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> > 11.743, 11.699, 10.152]
> > Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: Wrote 6 clusters
> > Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> > INFO: Program took 106615 ms (Minutes: 1.7769166666666667)
> >
> > *Dirichlet and Meanshift:*
> > Already detailed in M-1400, deprecated jobs still referenced.
> >
> >
> >
> > On Tue, Jan 21, 2014 at 6:20 PM, Andrew Musselman <
> > andrew.musselman@gmail.com> wrote:
> >
> >> *cluster-reuters.sh*
> >> *k-means:*
> >>
> >> [snip]
> >> :VL-19482{n=913 c=[0.06:0.011, 0.1:0.007, 0.13:0.010, 0.25:0.016,
> >> 0.38:0.020, 0.4:0.007, 0.5:0.032, 0
> >> Top Terms:
> >> banks =>
> >> 3.841823268955143
> >> bank =>
> >> 3.80633066361209
> >> debt =>
> >> 3.28065219870794
> >> said =>
> >> 2.5965700942088583
> >> he =>
> >> 2.335682813857497
> >> foreign =>
> >> 2.2217853688201403
> >> billion =>
> >> 2.1970193848291335
> >> would =>
> >> 1.9932392063955617
> >> loans =>
> >> 1.9309276792854233
> >> interest =>
> >> 1.787324501938
> >> have =>
> >> 1.762981951432578
> >> its =>
> >> 1.7615109954971866
> >> which =>
> >> 1.5822081148036862
> >> has =>
> >> 1.5600708189041956
> >> dlrs =>
> >> 1.5571038313005996
> >> finance =>
> >> 1.5539758811252924
> >> new =>
> >> 1.5176015811577555
> >> had =>
> >> 1.5138723701401844
> >> brazil =>
> >> 1.5083369853593172
> >> payments =>
> >> 1.4539044255886517
> >> Weight : [props - optional]: Point:
> >>
> >> :VL-7320{n=2726 c=[0:0.003, 0.1:0.010, 0.2:0.007, 0.3:0.009, 0.4:0.007,
> >> 0.40:0.003, 0.5:0.009, 0.57:0
> >> Top Terms:
> >> vs =>
> >> 6.126130791333171
> >> net =>
> >> 4.012191567277523
> >> cts =>
> >> 3.822006848832744
> >> shr =>
> >> 3.6786004856764527
> >> mln =>
> >> 2.9011643584038698
> >> loss =>
> >> 2.788368861463607
> >> qtr =>
> >> 2.714140225051522
> >> revs =>
> >> 2.4739861236454717
> >> profit =>
> >> 1.8146888090247015
> >> note =>
> >> 1.7977163272138388
> >> dlrs =>
> >> 1.6164390808155846
> >> avg =>
> >> 1.3901765773336587
> >> shrs =>
> >> 1.3856326531419314
> >> mths =>
> >> 1.3168717272038506
> >> 4th =>
> >> 1.2161158425617289
> >> oper =>
> >> 1.182419473776814
> >> year =>
> >> 1.178086061733047
> >> nine =>
> >> 1.0670554836445316
> >> 3rd =>
> >> 1.041334410056592
> >> inc =>
> >> 1.0019361981554935
> >> Weight : [props - optional]: Point:
> >>
> >>
> >> Inter-Cluster Density: 0.45562152681859414
> >> Intra-Cluster Density: 0.6952712632167628
> >> CDbw Inter-Cluster Density: 0.0
> >> CDbw Intra-Cluster Density: 16.486930227598684
> >> CDbw Separation: 194.49005884464628
> >>
> >> *fuzzy k-means:*
> >> :SV-18539{n=1039 c=[0:0.026, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> >> 0.01:0.005, 0.02:0.002, 0.0
> >> Top Terms:
> >> said =>
> >> 1.8665592354713065
> >> its =>
> >> 1.1335212213411592
> >> pct =>
> >> 1.0862816801353348
> >> dlrs =>
> >> 1.0854998884993752
> >> mln =>
> >> 1.043163996400643
> >> from =>
> >> 0.9684961110525736
> >> has =>
> >> 0.912161511978058
> >> company =>
> >> 0.8754186972808333
> >> mar =>
> >> 0.8675333452422878
> >> inc =>
> >> 0.7678617590362815
> >> would =>
> >> 0.7610968883652675
> >> he =>
> >> 0.7459988770503974
> >> which =>
> >> 0.7435613119406804
> >> year =>
> >> 0.7302840632748394
> >> u.s =>
> >> 0.7281061062439116
> >> shares =>
> >> 0.7260764102983083
> >> corp =>
> >> 0.7179807367808658
> >> new =>
> >> 0.7044203783157115
> >> stock =>
> >> 0.6962010978721442
> >> have =>
> >> 0.6464265467298506
> >> :SV-9431{n=1034 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> >> 0.01:0.004, 0.02:0.002, 0.02
> >> Top Terms:
> >> said =>
> >> 1.864911184196927
> >> dlrs =>
> >> 1.199286689822081
> >> mln =>
> >> 1.1802134783562215
> >> pct =>
> >> 1.1529704214798124
> >> its =>
> >> 1.1184398851519701
> >> from =>
> >> 1.016647848050332
> >> company =>
> >> 0.894703604722841
> >> mar =>
> >> 0.879986159541356
> >> has =>
> >> 0.8642799128491316
> >> year =>
> >> 0.8271823503717782
> >> inc =>
> >> 0.7871293745341424
> >> corp =>
> >> 0.737705498468879
> >> which =>
> >> 0.722975201852743
> >> would =>
> >> 0.708000816484415
> >> u.s =>
> >> 0.7073294276173905
> >> billion =>
> >> 0.7055723996916351
> >> he =>
> >> 0.7042684217823294
> >> new =>
> >> 0.6834737905434939
> >> shares =>
> >> 0.6753327384172428
> >> stock =>
> >> 0.6576225144041699
> >> :SV-4785{n=1044 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> >> 0.01:0.006, 0.02:0.002, 0.02
> >> Top Terms:
> >> said =>
> >> 1.8796076179735086
> >> its =>
> >> 1.172025965452378
> >> dlrs =>
> >> 1.130422792460914
> >> pct =>
> >> 1.082038255241358
> >> mln =>
> >> 1.0772146872767114
> >> company =>
> >> 0.9662235879639138
> >> from =>
> >> 0.9473172871605616
> >> has =>
> >> 0.9224712965830099
> >> mar =>
> >> 0.8769325856924421
> >> inc =>
> >> 0.8360245257169788
> >> shares =>
> >> 0.8334595641384324
> >> stock =>
> >> 0.7704621839612175
> >> corp =>
> >> 0.7682400250301806
> >> which =>
> >> 0.7389988207856137
> >> would =>
> >> 0.7339708917389389
> >> year =>
> >> 0.7088414843731325
> >> new =>
> >> 0.7038109468655172
> >> he =>
> >> 0.6993994455501005
> >> u.s =>
> >> 0.6772649147622415
> >> share =>
> >> 0.6241804830055171
> >>
> >> *lda:*
> >>
> >> [snip]
> >> 21539
> >> {0.02:0.7071698119320297,0.01:0.09185661419250732,0.055:0.05559716236030052,0:0.04416624482186779,0.046:0.04110903741109505,0.10:0.036664417789754995,0.073:0.009543544335363714,0.006913:0.0050293341354450535,0.007050:0.004031353935388081,0.003:0.0019163134919350053}
> >> 21540
> >> {0.04:0.4289840457590362,0.006913:0.3764552842292379,0.03:0.14201700033942147,0.025:0.031492533562460345,0.01:0.0057428959027208,0.0625:0.005563615996288134,0.007050:0.004410121345864598,0.02:0.003315679992182833,0.06:0.0010680133665222197,0.057:2.6908116668663575E-4}
> >> 21541
> >> {0:0.6323790020346274,0.02:0.282038438551802,0.06:0.05929917592183956,0.046:0.021922159161082488,0.025:0.003123559949176154,0.1:6.163861804777446E-4,0.057:1.1513607281652563E-4,0.077:7.546641269535658E-5,0.05:6.020069105221075E-5,0.04:5.727546417602751E-5}
> >> 21542
> >> {0.06:0.7916202902993545,0.003:0.12474538653424426,0.03:0.0516187501990535,0.073:0.013078206873952192,0.077:0.006874558859394474,0.1:0.004726257215175674,0.07:0.0029954110394770084,0.02:0.0015535221634909381,0.0625:8.057802584933225E-4,0.006913:5.349608183182145E-4}
> >> 21543
> >> {0.03:0.8557703918728101,0.007050:0.0884696514495358,0.01:0.02386217742025169,0.055:0.01794750983209314,0.046:0.007244240662980594,0.02:0.0025021610305199344,0.1:0.0011795344808501283,0.0625:9.555718731181631E-4,0.077:8.384080940465597E-4,0.003:2.4480111447900804E-4}
> >> 21544
> >> {0.006913:0.6497351267772595,0.03:0.2586364130498913,0.003:0.0845308091084703,0.05:0.003488921831506818,0.0625:0.0011991059579690732,0.06:5.301659535652308E-4,0.055:3.152013670552699E-4,0.025:3.11949418681945E-4,0.02:2.2346509541410544E-4,0.057:1.727741085612565E-4}
> >> 21545
> >> {0.07:0.24937001680778675,0.03:0.22854680579732564,0.1:0.22068070811382484,0.05:0.21891358916881656,0.06:0.04971205462818302,0.046:0.029384231252419384,0.025:0.0014118858333346275,0.0625:0.0012321257282372393,0.073:3.1606402052550454E-4,0.04:1.1662985389192366E-4}
> >> 21546
> >> {0.006913:0.652435612545713,0.073:0.24282539253890825,0.003:0.055020663101050894,0.046:0.01978650831133704,0.04:0.011699600233998459,0.02:0.006822518565048025,0.05:0.0042733514779345234,0.025:0.0020768247329981497,0.03:0.0015466188524926666,0:0.0010433788491090747}
> >> 21547
> >> {0.1:0.935072008706917,0.077:0.040616382672055494,0.046:0.023138054104883225,0.06:3.42252302356326E-4,0.03:2.672247473681925E-4,0.007050:9.004419280457053E-5,0.006913:7.844271097106069E-5,0.04:7.198835322717653E-5,0.073:5.8017696474244885E-5,0.02:4.474316852571109E-5}
> >> 21548
> >> {0.006913:0.4510141547325999,0.03:0.21472666934984877,0.07:0.15060570743672352,0.046:0.08216181853028293,0.05:0.07498297963542139,0.077:0.01401215532342401,0.04:0.006135722806477439,0.073:0.0031124469556872442,0.02:0.001327252389650958,0.025:4.5167404646311124E-4}
> >> 21549
> >> {0.077:0.5249260290096315,0:0.27520186965742544,0.073:0.08959570186504386,0.057:0.05063809804337512,0.02:0.045313417578343,0.03:0.00856024297303885,0.06:0.0034039136814002697,0.07:8.611323331226122E-4,0.05:3.9387255234958607E-4,0.04:3.454752498776842E-4}
> >> 21550
> >> {0.077:0.8319708524327014,0.007050:0.16001430652201168,0.0625:0.0024884528530674226,0.03:0.0024228964094551416,0.006913:0.0010048446080994332,0.025:5.974607168723063E-4,0.06:2.7741593377792194E-4,0.057:2.2975316663437597E-4,0.073:2.1769059479546598E-4,0.01:1.4945234676270913E-4}
> >> 21551
> >> {0:0.43298549949426596,0.07:0.30407459462158964,0.0625:0.15744077685124136,0.01:0.06385977925647884,0.025:0.020205557109867888,0.04:0.013226123091680062,0.055:0.006501562964287696,0.073:5.871897881404074E-4,0.003:4.554836644848264E-4,0.05:2.0489216962221778E-4}
> >> 21552
> >> {0.06:0.7383744333942458,0.02:0.07321126682102753,0.07:0.05910378841288311,0.10:0.056907223730939045,0:0.02739559786902668,0.055:0.02228913751272657,0.1:0.00943274247398869,0.073:0.007301445750018608,0.03:0.0027711985062277246,0.046:0.0022569760697531112}
> >> 21553
> >> {0.006913:0.9383779606743132,0.07:0.018356074893823605,0.1:0.017572502072403694,0.025:0.011031848455352145,0.02:0.007110054905474798,0.10:0.0032390933761145377,0.01:0.0022676909091530165,0.06:6.35712654913703E-4,0.003:3.037608224814005E-4,0.0625:2.569989005512836E-4}
> >> 21554
> >> {0.02:0.4740260172915081,0.03:0.3264871353578654,0.003:0.12383071192450323,0:0.0387913487693229,0.046:0.013922363892934853,0.01:0.01115832101415319,0.055:0.004438529958216764,0.006913:0.0031112721201723413,0.025:0.002399946628810436,0.073:6.131643125523258E-4}
> >> 21555
> >> {0.01:0.42396888624053325,0.057:0.2876207419392007,0.055:0.13436747721404915,0.10:0.0923627978966975,0.05:0.030530372962547347,0.003:0.008404234144369286,0.006913:0.007605013189108045,0.007050:0.005108825532219537,0.0625:0.004651573366090424,0.025:0.002180181204608079}
> >> 21556
> >> {0.02:0.9634685201058687,0.077:0.02816791882994464,0.003:0.007368001305747207,0.01:4.6299664334060455E-4,0.055:6.358811991095171E-5,0.06:5.5240076581247115E-5,0.03:5.231350505975146E-5,0.073:4.4773553407989E-5,0:3.6810457882911134E-5,0.006913:3.469265861451538E-5}
> >> 21557
> >> {0.06:0.9993947051352264,0.05:2.53296407549323E-4,0.03:9.647254237540585E-5,0.04:4.3439270433017595E-5,0.006913:3.354681491698619E-5,0.046:2.8586727566416525E-5,0.007050:2.5584606074089293E-5,0.02:2.2477243875316502E-5,0:1.9194783598477564E-5,0.073:1.6900267481892075E-5}
> >> 21558
> >> {0.06:0.9985073788555696,0.03:8.033091100049725E-4,0.02:1.1953766739610202E-4,0.1:9.257610652171745E-5,0.006913:6.903783269939522E-5,0.04:5.7960967280609926E-5,0.07:5.5767050956214925E-5,0.046:4.857027337508731E-5,0.007050:3.5964741541364354E-5,0.057:3.538185773175377E-5}
> >> 21559
> >> {0.006913:0.5411158302162348,0.073:0.10595054605908563,0.04:0.10013413310674449,0.007050:0.08891690362990352,0.003:0.08028744789933502,0.03:0.07779866329563544,0.025:0.0024314950846438975,0.0625:0.0018111845757907532,0.077:5.019763638644379E-4,0.055:2.6705221435486376E-4}
> >> 21560
> >> {0.06:0.9978884964462115,0.03:9.139179655096763E-4,0.02:6.066525825847198E-4,0.04:3.2209695809936266E-4,0.006913:7.970816960964983E-5,0.007050:5.58984012266038E-5,0.046:1.545206468939752E-5,0.077:1.5319891034946036E-5,0:1.5223193391279898E-5,0.073:1.1687759018690983E-5}
> >> 21561
> >> {0.06:0.8704787441462888,0.007050:0.09131090501970876,0.0625:0.03778566273654969,0.046:8.111987924033587E-5,0.01:6.397000919080148E-5,0.1:4.979138875277178E-5,0.073:2.7942853854174412E-5,0.03:2.762357950066146E-5,0.077:2.6820742114533983E-5,0.025:2.6208487691114472E-5}
> >> 21562
> >> {0.06:0.9625603310705717,0.02:0.01708933716171641,0.007050:0.011374975845817934,0.07:0.00482114999912725,0.057:0.003310531318631415,0.077:2.568457462720504E-4,0.025:1.3439876933758153E-4,0.1:1.0512433283405881E-4,0:8.737443941062146E-5,0.046:5.9284232059258864E-5}
> >> 21563
> >> {0.06:0.9996809273982157,0.046:1.6325482120709976E-4,0.02:2.1084598024339765E-5,0.006913:1.7901242019979392E-5,0.04:1.3991162886383784E-5,0.03:1.3682157013015017E-5,0.077:1.3602492456590212E-5,0.007050:1.206712606141859E-5,0.1:1.0231842284246997E-5,0.057:7.704725064510759E-6}
> >> 21564
> >> {0.06:0.9997861963976675,0.03:2.7140899197995276E-5,0.077:2.4456158311423386E-5,0.04:1.7952818233846462E-5,0.02:1.6334455815684533E-5,0.006913:1.4910843270220926E-5,0.073:1.4893553937733922E-5,0.1:1.2545291899719683E-5,0.007050:1.2337617858874285E-5,0.07:1.121302251254011E-5}
> >> 21565
> >> {0.006913:0.5047832315246878,0.007050:0.2502505818382197,0.04:0.09937533960784072,0.03:0.05332716291468396,0.0625:0.035738976624857435,0.05:0.023139962103851885,0.1:0.01510786357969295,0.025:0.01294816540331917,0.06:0.003589347327961106,0.073:4.8607269023994543E-4}
> >> 21566
> >> {0.03:0.865579490292393,0.073:0.065013560785593,0.077:0.056622094108767465,0.046:0.006811842330071251,0.057:0.0021561477140846267,0.01:0.0013149375957061502,0.04:7.456782721333958E-4,0.05:6.412995789267404E-4,0:2.107021879325011E-4,0.007050:1.8121393989190674E-4}
> >> 21567
> >> {0.077:0.6528663315309344,0.03:0.2794089480653573,0.025:0.060214953606503134,0.003:0.0029851203917978303,0.01:0.002840784719750811,0.007050:5.508511345707982E-4,0.02:3.3395642786457786E-4,0.07:1.7452781529689483E-4,0.055:1.541344869853217E-4,0.046:1.0154945247629696E-4}
> >> 21568
> >> {0.057:0.7925855379581803,0.03:0.2036907633660934,0.06:0.002283924010657722,0.046:0.0011125177332923534,0.10:6.706442563331911E-5,0.02:3.9070227131596934E-5,0.07:3.266094677087569E-5,0.1:2.3436639268605713E-5,0.077:2.318067689954084E-5,0.006913:2.273173553155518E-5}
> >> 21569
> >> {0.06:0.7223525910216753,0.1:0.22280373045161775,0.04:0.04239924319412595,0.02:0.006529556700876843,0.007050:0.004661124794787862,0.10:2.017974034648702E-4,0.05:2.017516118028694E-4,0.025:1.583677755896652E-4,0.006913:1.1939663934259253E-4,0:8.736457986006156E-5}
> >> 21570
> >> {0.073:0.5321098304788365,0.006913:0.3897385574581158,0.02:0.051125703897749404,0.077:0.012417023254098358,0.01:0.006491518762040415,0.03:0.005434610908750246,0.055:9.352347087701305E-4,0.06:5.239363525283659E-4,0.10:4.4114444135088393E-4,0.003:2.2360302221231853E-4}
> >> 21571
> >> {0.06:0.9074255414695478,0.05:0.08971808812931319,0.02:0.0019124104766371694,0.1:2.581234320438502E-4,0.073:1.9041498001195312E-4,0.046:8.348355382104383E-5,0.006913:5.748062878632201E-5,0.057:4.8979196235823963E-5,0.04:4.4005272512340306E-5,0.03:4.1089625609562384E-5}
> >> 21572
> >> {0.06:0.7216888928389846,0.04:0.1955684645266304,0.006913:0.08235454447065854,0.1:9.82899034505965E-5,0.046:4.9184577303445956E-5,0.05:4.021965070167039E-5,0.007050:3.081280652634891E-5,0.073:2.791145099471127E-5,0.02:1.95681856192452E-5,0:1.9037190007395713E-5}
> >> 21573
> >> {0.05:0.8568971411565196,0.046:0.12909436895238377,0.06:0.012704234652048044,0.04:2.472784666357729E-4,0.055:2.1886698996361582E-4,0.1:1.1974451444757112E-4,0.0625:9.082357988309755E-5,0.07:9.030017229129562E-5,0.03:7.268045692763623E-5,0.073:6.607926928741721E-5}
> >> 21574
> >> {0.046:0.5619466458628039,0.006913:0.184782367089353,0.0625:0.09726566772972363,0.003:0.09534816862353344,0.02:0.015692927163565275,0.073:0.015492017672231727,0.01:0.01411218625979968,0.007050:0.01037341031640615,0.055:0.0020124341216292752,0:0.0013828685922332715}
> >> 21575
> >> {0.05:0.3167681189235041,0.06:0.2879333280436204,0.046:0.22584628506521745,0.003:0.15994241628395953,0.0625:0.006482994028630967,0.1:0.0016773104050919493,0.055:4.8725015996676173E-4,0.03:2.432893903382962E-4,0.025:1.6015914759364425E-4,0.006913:1.0021292427951807E-4}
> >> 21576
> >> {0.077:0.2407816981967022,0.003:0.18594668103110193,0.10:0.14951775492012523,0.0625:0.09241152906714677,0.007050:0.08997645163280943,0.057:0.07102865286733068,0.055:0.048046579920457584,0.05:0.03776387140040494,0.073:0.033355081179026046,0.006913:0.015980584385115525}
> >> 21577
> >> {0.06:0.9599074612361259,0.1:0.03694135499501186,0.04:0.0021941315048273186,0.057:4.7070027358666304E-4,0.02:1.8231606308803002E-4,0.0625:6.234942557920162E-5,0.006913:5.159011604129561E-5,0.046:3.9832812943124216E-5,0.007050:3.0380683715134534E-5,0.05:2.8920851352755496E-5}
> >>
> >>
> >> *Streaming k-means:*
> >>
> >> [snip]
> >> INFO: Number of Centroids: 0
> >> Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
> >> WARNING: job_local23982482_0001
> >> java.lang.IllegalArgumentException: Must have nonzero number of training
> >> and test vectors. Asked for %.1f %% of %d vectors for test
> >> [10.000000149011612, 0]
> >> at
> >> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
> >> at
> >> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
> >> at
> >> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
> >> at
> >> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
> >> at
> >> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
> >> at
> >> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
> >> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
> >> at
> >> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
> >> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
> >> at
> >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
> >>
> >> [snip]
> >>
> >> WARNING: No qualcluster.props found on classpath, will use command-line
> >> arguments only
> >> Num clusters: 0; maxDistance: 0.000000
> >> [Dunn Index] First: Infinity
> >> [Davies-Bouldin Index] First: NaN
> >> Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
> >> INFO: Program took 535 ms (Minutes: 0.008916666666666666)
> >> cluster,distance.mean,distance.sd
> >> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> >>
> >>
> >> On Tue, Jan 21, 2014 at 1:47 PM, Andrew Musselman <
> >> andrew.musselman@gmail.com> wrote:
> >>
> >>> *classify-20newsgroups.sh*
> >>>
> >>> *Complementary naive bayes:*
> >>> =======================================================
> >>> Summary
> >>> -------------------------------------------------------
> >>> Correctly Classified Instances : 11207 98.9406%
> >>> Incorrectly Classified Instances : 120 1.0594%
> >>> Total Classified Instances : 11327
> >>>
> >>> =======================================================
> >>> Confusion Matrix
> >>> -------------------------------------------------------
> >>> a b c d e f g h i
> >>> j k l m n o p q r s
> >>> t <--Classified as
> >>> 475 0 0 1 0 0 0 0 0
> >>> 0 0 0 0 0 1 0 1 0 0
> >>> 0 | 478 a = alt.atheism
> >>> 0 597 1 1 0 1 1 0 0
> >>> 0 0 1 0 2 1 0 0 0 0
> >>> 0 | 605 b = comp.graphics
> >>> 0 1 620 3 0 1 0 0 0
> >>> 0 0 1 0 0 1 0 0 0 0
> >>> 0 | 627 c = comp.os.ms-windows.misc
> >>> 1 1 1 593 2 0 0 0 0
> >>> 0 0 0 0 0 0 1 0 0 0
> >>> 0 | 599 d = comp.sys.ibm.pc.hardware
> >>> 0 1 1 0 568 0 1 0 0
> >>> 0 1 1 2 0 0 0 0 1 0
> >>> 0 | 576 e = comp.sys.mac.hardware
> >>> 0 4 2 0 0 581 0 0 0
> >>> 0 0 0 0 0 0 0 0 0 0
> >>> 0 | 587 f = comp.windows.x
> >>> 0 0 0 1 2 0 571 3 0
> >>> 0 1 1 4 1 0 0 0 0 0
> >>> 0 | 584 g = misc.forsale
> >>> 0 0 0 1 0 0 0 589 1
> >>> 0 0 1 1 0 0 0 0 0 0
> >>> 0 | 593 h = rec.autos
> >>> 0 0 0 0 0 0 0 1 565
> >>> 0 0 0 0 0 1 0 0 0 0
> >>> 0 | 567 i = rec.motorcycles
> >>> 0 0 0 0 0 0 0 0 0
> >>> 600 2 0 0 0 1 0 0 0 0
> >>> 0 | 603 j = rec.sport.baseball
> >>> 0 0 0 0 0 0 0 0 0
> >>> 1 584 0 0 0 0 0 0 0 0
> >>> 0 | 585 k = rec.sport.hockey
> >>> 0 0 0 0 0 0 0 0 0
> >>> 0 0 579 0 0 0 0 0 1 0
> >>> 0 | 580 l = sci.crypt
> >>> 0 0 0 1 3 0 2 0 0
> >>> 2 0 0 567 1 2 1 0 0 0
> >>> 0 | 579 m = sci.electronics
> >>> 0 0 0 0 0 0 0 0 0
> >>> 0 0 0 1 605 0 0 0 0 0
> >>> 0 | 606 n = sci.med
> >>> 0 0 0 0 0 0 0 0 0
> >>> 0 0 0 0 0 602 0 0 0 0
> >>> 0 | 602 o = sci.space
> >>> 0 0 0 0 0 0 0 0 0
> >>> 0 0 0 0 1 0 602 0 0 1
> >>> 0 | 604 p = soc.religion.christian
> >>> 0 0 0 0 0 0 0 0 0
> >>> 0 0 0 0 0 0 0 556 0 0
> >>> 0 | 556 q = talk.politics.mideast
> >>> 0 0 1 0 0 0 0 0 0
> >>> 0 0 1 0 0 1 0 0 568 0
> >>> 0 | 571 r = talk.politics.guns
> >>> 11 0 0 0 0 0 0 0 0
> >>> 1 0 0 0 1 3 8 1 4 338
> >>> 2 | 369 s = talk.religion.misc
> >>> 0 0 0 0 0 0 0 0 0
> >>> 0 1 0 0 0 1 0 3 4 0
> >>> 447 | 456 t = talk.politics.misc
> >>>
> >>> =======================================================
> >>> Statistics
> >>> -------------------------------------------------------
> >>> Kappa 0.9806
> >>> Accuracy 98.9406%
> >>> Reliability 94.0932%
> >>> Reliability (standard deviation) 0.2163
> >>>
> >>> Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> INFO: Program took 15870 ms (Minutes: 0.2645)
> >>> + echo 'Testing on holdout set'
> >>> Testing on holdout set
> >>> + ./bin/mahout testnb -i /tmp/mahout-work-ec2-user/20news-test-vectors
> >>> -m /tmp/mahout-work-ec2-user/model -l /tmp/mahout-work-ec2-user/labelindex
> >>> -ow -o /tmp/mahout-work-ec2-user/20news-testing -c
> >>>
> >>> [snip]
> >>>
> >>> INFO: Complementary Results:
> >>> =======================================================
> >>> Summary
> >>> -------------------------------------------------------
> >>> Correctly Classified Instances : 6715 89.3071%
> >>> Incorrectly Classified Instances : 804 10.6929%
> >>> Total Classified Instances : 7519
> >>>
> >>> =======================================================
> >>> Confusion Matrix
> >>> -------------------------------------------------------
> >>> a b c d e f g h i
> >>> j k l m n o p q r s
> >>> t <--Classified as
> >>> 298 0 0 0 0 0 0 0 0
> >>> 1 0 0 0 1 2 5 1 0 13
> >>> 0 | 321 a = alt.atheism
> >>> 0 298 11 6 1 12 2 2 1
> >>> 1 3 8 3 4 2 4 1 4 4
> >>> 1 | 368 b = comp.graphics
> >>> 1 17 286 16 4 9 6 3 2
> >>> 0 1 0 1 7 1 0 2 1 0
> >>> 1 | 358 c = comp.os.ms-windows.misc
> >>> 2 6 11 309 9 5 14 8 1
> >>> 0 2 0 6 4 2 0 1 2 1
> >>> 0 | 383 d = comp.sys.ibm.pc.hardware
> >>> 0 10 8 7 334 7 5 5 2
> >>> 0 3 0 2 1 1 0 1 1 0
> >>> 0 | 387 e = comp.sys.mac.hardware
> >>> 1 13 7 8 2 355 2 0 2
> >>> 0 0 5 1 1 3 0 0 1 0
> >>> 0 | 401 f = comp.windows.x
> >>> 0 7 11 29 12 9 268 16 8
> >>> 4 3 2 6 4 2 1 3 1 2
> >>> 3 | 391 g = misc.forsale
> >>> 0 1 0 0 3 0 7 362 8
> >>> 2 2 1 2 0 2 0 1 2 0
> >>> 4 | 397 h = rec.autos
> >>> 0 0 0 1 0 0 1 0 423
> >>> 0 0 0 2 1 0 1 0 0 0
> >>> 0 | 429 i = rec.motorcycles
> >>> 0 0 1 0 0 0 0 2 2
> >>> 371 8 0 2 3 0 2 0 0 0
> >>> 0 | 391 j = rec.sport.baseball
> >>> 0 0 1 0 0 0 1 0 0
> >>> 2 409 0 0 0 0 0 0 0 0
> >>> 1 | 414 k = rec.sport.hockey
> >>> 0 0 1 2 1 0 1 0 0
> >>> 0 0 404 0 0 0 0 0 1 0
> >>> 1 | 411 l = sci.crypt
> >>> 0 5 4 11 1 3 7 9 2
> >>> 5 3 3 339 2 6 0 1 1 2
> >>> 1 | 405 m = sci.electronics
> >>> 0 4 0 1 0 0 0 1 0
> >>> 1 1 0 3 367 3 1 2 0 0
> >>> 0 | 384 n = sci.med
> >>> 0 1 2 0 1 0 2 0 0
> >>> 1 0 0 1 1 375 0 1 0 0
> >>> 0 | 385 o = sci.space
> >>> 4 2 1 1 0 0 1 1 2
> >>> 0 0 1 1 5 1 367 4 0 1
> >>> 1 | 393 p = soc.religion.christian
> >>> 0 1 0 0 0 0 0 0 0
> >>> 2 0 0 0 0 0 2 378 0 1
> >>> 0 | 384 q = talk.politics.mideast
> >>> 0 0 0 0 0 2 1 1 1
> >>> 1 0 3 0 3 0 0 2 319 2
> >>> 4 | 339 r = talk.politics.guns
> >>> 32 0 0 1 0 0 0 0 0
> >>> 1 1 1 0 2 2 26 5 7 175
> >>> 6 | 259 s = talk.religion.misc
> >>> 0 0 0 2 0 0 0 0 0
> >>> 1 2 2 0 1 2 1 10 18 2
> >>> 278 | 319 t = talk.politics.misc
> >>>
> >>> =======================================================
> >>> Statistics
> >>> -------------------------------------------------------
> >>> Kappa 0.8594
> >>> Accuracy 89.3071%
> >>> Reliability 84.611%
> >>> Reliability (standard deviation) 0.2148
> >>>
> >>> Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> INFO: Program took 10840 ms (Minutes: 0.18066666666666667)
> >>>
> >>>
> >>> *Naive bayes:*
> >>> INFO: Standard NB Results:
> >>> =======================================================
> >>> Summary
> >>> -------------------------------------------------------
> >>> Correctly Classified Instances : 11286 99.0869%
> >>> Incorrectly Classified Instances : 104 0.9131%
> >>> Total Classified Instances : 11390
> >>>
> >>> =======================================================
> >>> Confusion Matrix
> >>> -------------------------------------------------------
> >>> a b c d e f g h i
> >>> j k l m n o p q r s
> >>> t <--Classified as
> >>> 474 0 0 0 0 0 0 0 0
> >>> 0 0 0 0 0 0 0 0 0 2
> >>> 1 | 477 a = alt.atheism
> >>> 0 566 0 2 0 1 0 0 0
> >>> 0 0 0 0 0 0 0 0 0 0
> >>> 0 | 569 b = comp.graphics
> >>> 0 10 590 29 2 4 1 0 0
> >>> 0 0 0 1 0 0 0 0 0 0
> >>> 1 | 638 c = comp.os.ms-windows.misc
> >>> 0 0 0 596 0 0 0 0 0
> >>> 0 0 0 0 0 0 0 0 0 0
> >>> 0 | 596 d = comp.sys.ibm.pc.hardware
> >>> 0 0 0 0 575 0 1 0 0
> >>> 0 0 0 1 0 0 0 0 0 0
> >>> 0 | 577 e = comp.sys.mac.hardware
> >>> 0 2 2 2 0 593 1 0 0
> >>> 0 0 0 0 0 1 0 0 0 0
> >>> 0 | 601 f = comp.windows.x
> >>> 0 0 0 1 0 0 589 1 0
> >>> 0 1 0 2 0 0 0 0 0 0
> >>> 0 | 594 g = misc.forsale
> >>> 0 0 0 0 0 0 0 594 0
> >>> 0 0 0 0 0 0 0 0 0 0
> >>> 0 | 594 h = rec.autos
> >>> 0 0 0 0 0 0 0 0 611
> >>> 0 0 0 0 0 0 0 0 0 0
> >>> 0 | 611 i = rec.motorcycles
> >>> 0 0 0 0 0 0 0 0 0
> >>> 616 1 0 0 0 0 0 0 0 0
> >>> 0 | 617 j = rec.sport.baseball
> >>> 0 0 0 0 0 0 1 0 0
> >>> 0 620 0 0 0 0 0 0 0 0
> >>> 0 | 621 k = rec.sport.hockey
> >>> 0 0 0 0 0 0 0 0 0
> >>> 0 0 580 0 0 0 0 0 1 0
> >>> 0 | 581 l = sci.crypt
> >>> 0 0 0 3 1 0 0 0 0
> >>> 0 0 0 571 0 0 0 0 0 0
> >>> 0 | 575 m = sci.electronics
> >>> 0 0 0 0 0 0 0 0 0
> >>> 0 0 0 2 583 0 0 0 0 0
> >>> 0 | 585 n = sci.med
> >>> 0 0 0 0 0 0 0 0 0
> >>> 0 0 0 0 1 599 0 0 0 0
> >>> 0 | 600 o = sci.space
> >>> 0 1 0 0 0 0 0 0 0
> >>> 0 0 0 0 0 0 615 0 0 0
> >>> 0 | 616 p = soc.religion.christian
> >>> 1 0 0 0 0 0 0 0 0
> >>> 0 0 0 0 0 0 1 560 0 0
> >>> 0 | 562 q = talk.politics.mideast
> >>> 0 0 1 0 0 0 0 0 0
> >>> 0 0 1 0 0 0 0 0 548 0
> >>> 1 | 551 r = talk.politics.guns
> >>> 10 0 0 0 0 0 0 0 0
> >>> 0 0 0 0 0 1 1 0 2 344
> >>> 1 | 359 s = talk.religion.misc
> >>> 0 0 0 0 0 0 0 0 0
> >>> 0 0 1 1 0 0 0 0 2 0
> >>> 462 | 466 t = talk.politics.misc
> >>>
> >>> =======================================================
> >>> Statistics
> >>> -------------------------------------------------------
> >>> Kappa 0.9847
> >>> Accuracy 99.0869%
> >>> Reliability 94.3334%
> >>> Reliability (standard deviation) 0.2169
> >>>
> >>> Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> INFO: Program took 14304 ms (Minutes: 0.2384)
> >>> + echo 'Testing on holdout set'
> >>> Testing on holdout set
> >>>
> >>> [snip]
> >>>
> >>> INFO: Standard NB Results:
> >>> =======================================================
> >>> Summary
> >>> -------------------------------------------------------
> >>> Correctly Classified Instances : 6718 90.1019%
> >>> Incorrectly Classified Instances : 738 9.8981%
> >>> Total Classified Instances : 7456
> >>>
> >>> =======================================================
> >>> Confusion Matrix
> >>> -------------------------------------------------------
> >>> a b c d e f g h i
> >>> j k l m n o p q r s
> >>> t <--Classified as
> >>> 294 0 0 0 0 0 0 0 0
> >>> 0 0 2 0 1 1 6 1 1 16
> >>> 0 | 322 a = alt.atheism
> >>> 0 345 6 14 6 11 6 0 0
> >>> 0 0 5 7 1 3 0 0 0 0
> >>> 0 | 404 b = comp.graphics
> >>> 2 29 177 78 22 19 9 1 0
> >>> 0 0 4 2 0 1 1 0 0 1
> >>> 1 | 347 c = comp.os.ms-windows.misc
> >>> 1 9 2 335 18 2 10 0 0
> >>> 0 1 0 8 0 0 0 0 0 0
> >>> 0 | 386 d = comp.sys.ibm.pc.hardware
> >>> 1 4 2 13 347 3 5 1 0
> >>> 0 1 0 7 1 0 0 0 1 0
> >>> 0 | 386 e = comp.sys.mac.hardware
> >>> 0 20 0 4 0 352 4 0 0
> >>> 0 0 0 1 1 3 0 1 0 1
> >>> 0 | 387 f = comp.windows.x
> >>> 0 2 0 21 5 1 323 7 2
> >>> 2 0 2 12 0 3 0 0 0 0
> >>> 1 | 381 g = misc.forsale
> >>> 0 1 0 0 1 0 15 363 8
> >>> 1 0 0 4 1 0 0 0 1 0
> >>> 1 | 396 h = rec.autos
> >>> 0 1 0 0 0 0 6 6 370
> >>> 0 0 0 0 1 0 0 0 0 1
> >>> 0 | 385 i = rec.motorcycles
> >>> 1 0 0 1 1 0 2 1 2
> >>> 362 5 0 2 0 0 0 0 0 0
> >>> 0 | 377 j = rec.sport.baseball
> >>> 0 0 0 1 2 0 0 0 0
> >>> 3 371 0 0 0 0 0 0 0 0
> >>> 1 | 378 k = rec.sport.hockey
> >>> 0 3 1 0 1 0 2 0 0
> >>> 0 0 396 0 1 0 0 1 1 1
> >>> 3 | 410 l = sci.crypt
> >>> 0 7 0 7 7 2 6 4 0
> >>> 0 0 1 369 2 2 0 0 0 0
> >>> 2 | 409 m = sci.electronics
> >>> 0 3 0 2 1 0 2 0 0
> >>> 0 0 1 4 383 4 0 0 1 0
> >>> 4 | 405 n = sci.med
> >>> 0 5 0 0 1 0 3 0 0
> >>> 0 0 0 1 0 374 1 0 0 1
> >>> 1 | 387 o = sci.space
> >>> 6 2 0 1 1 0 0 1 0
> >>> 1 0 0 1 5 0 352 2 1 7
> >>> 1 | 381 p = soc.religion.christian
> >>> 1 1 0 0 0 0 0 0 0
> >>> 0 1 0 0 0 0 0 373 1 0
> >>> 1 | 378 q = talk.politics.mideast
> >>> 0 0 0 0 0 0 1 0 1
> >>> 0 0 2 0 0 0 0 0 346 2
> >>> 7 | 359 r = talk.politics.guns
> >>> 26 1 0 1 0 0 0 2 0
> >>> 1 1 0 0 1 1 20 2 6 200
> >>> 7 | 269 s = talk.religion.misc
> >>> 1 0 0 0 0 0 0 2 0
> >>> 0 1 0 0 2 2 0 1 14 0
> >>> 286 | 309 t = talk.politics.misc
> >>>
> >>> =======================================================
> >>> Statistics
> >>> -------------------------------------------------------
> >>> Kappa 0.8726
> >>> Accuracy 90.1019%
> >>> Reliability 85.4491%
> >>> Reliability (standard deviation) 0.2222
> >>>
> >>> Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> INFO: Program took 10878 ms (Minutes: 0.1813)
> >>>
> >>> *SGD:*
> >>> 7532 test files
> >>>
> >>> =======================================================
> >>> Summary
> >>> -------------------------------------------------------
> >>> Correctly Classified Instances : 5649 75%
> >>> Incorrectly Classified Instances : 1883 25%
> >>> Total Classified Instances : 7532
> >>>
> >>> =======================================================
> >>> Confusion Matrix
> >>> -------------------------------------------------------
> >>> a b c d e f g h i
> >>> j k l m n o p q r s
> >>> t <--Classified as
> >>> 186 6 3 10 5 0 33 4 13
> >>> 15 7 1 24 15 3 15 5 5 29
> >>> 15 | 394 a = sci.space
> >>> 5 309 0 3 2 5 0 0 0
> >>> 1 9 21 2 0 0 18 4 4 1
> >>> 1 | 385 b = comp.sys.mac.hardware
> >>> 4 1 101 3 0 1 63 0 7
> >>> 0 1 1 5 16 3 0 3 7 1
> >>> 34 | 251 c = talk.religion.misc
> >>> 11 12 1 265 1 10 3 0 0
> >>> 17 10 11 5 2 0 11 3 6 21
> >>> 0 | 389 d = comp.graphics
> >>> 2 1 1 0 349 2 3 0 3
> >>> 2 6 1 5 1 0 2 15 2 1
> >>> 2 | 398 e = rec.motorcycles
> >>> 7 20 3 19 2 254 6 0 2
> >>> 11 2 39 7 2 0 4 2 2 9
> >>> 3 | 394 f = comp.os.ms-windows.misc
> >>> 2 1 13 0 0 0 247 0 1
> >>> 1 3 0 6 2 4 0 2 3 5
> >>> 29 | 319 g = alt.atheism
> >>> 1 1 0 0 2 0 2 361 0
> >>> 1 2 0 2 0 0 1 3 22 0
> >>> 1 | 399 h = rec.sport.hockey
> >>> 3 0 3 1 0 0 5 0 161
> >>> 0 1 2 12 102 0 0 1 2 11
> >>> 6 | 310 i = talk.politics.misc
> >>> 2 8 0 19 0 19 0 0 1
> >>> 294 10 11 4 2 0 5 0 3 11
> >>> 6 | 395 j = comp.windows.x
> >>> 2 10 0 1 1 0 0 0 0
> >>> 1 347 13 2 1 0 5 3 2 2
> >>> 0 | 390 k = misc.forsale
> >>> 1 36 0 6 1 25 0 0 1
> >>> 6 10 257 2 1 0 34 6 0 6
> >>> 0 | 392 l = comp.sys.ibm.pc.hardware
> >>> 2 2 2 2 1 0 12 0 0
> >>> 6 10 4 312 5 2 13 11 3 3
> >>> 6 | 396 m = sci.med
> >>> 2 0 3 2 1 0 0 1 13
> >>> 0 5 1 2 314 2 0 2 2 10
> >>> 4 | 364 n = talk.politics.guns
> >>> 1 0 2 1 1 0 34 1 33
> >>> 1 3 0 1 8 271 1 4 5 6
> >>> 3 | 376 o = talk.politics.mideast
> >>> 3 14 0 8 2 8 3 1 1
> >>> 7 12 29 6 2 1 245 13 2 32
> >>> 4 | 393 p = sci.electronics
> >>> 3 3 0 2 11 0 1 0 2
> >>> 1 11 6 4 2 0 11 330 4 4
> >>> 1 | 396 q = rec.autos
> >>> 0 0 1 0 1 0 4 12 3
> >>> 1 3 0 0 0 0 5 6 359 1
> >>> 1 | 397 r = rec.sport.baseball
> >>> 0 1 0 0 0 1 0 0 3
> >>> 3 0 0 3 2 1 6 1 6 366
> >>> 3 | 396 s = sci.crypt
> >>> 0 2 11 1 1 0 40 0 1
> >>> 2 3 4 2 1 0 5 0 2 2
> >>> 321 | 398 t = soc.religion.christian
> >>>
> >>> =======================================================
> >>> Statistics
> >>> -------------------------------------------------------
> >>> Kappa 0.7073
> >>> Accuracy 75%
> >>> Reliability 70.6238%
> >>> Reliability (standard deviation) 0.2187
> >>> Log-likelihood mean : -1.1182
> >>> 25%-ile : -1.6911
> >>> 75%-ile : -0.0803
> >>>
> >>> Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
> >>> INFO: Program took 10783 ms (Minutes: 0.17971666666666666)
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <su...@yahoo.com>wrote:
> >>>
> >>>> Thanks Andrew for reporting that. I rolled back the release to fix this
> >>>> and few other issues.
> >>>>
> >>>> We have removed asf-examples*.sh from trunk as the sample file at the
> >>>> url mentioned in ur email is not available.
> >>>> This is something we need to fix and restore in 1.0.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <
> >>>> ap.dev@outlook.com> wrote:
> >>>>
> >>>> from the asf-email-examples.sh script:
> >>>>
> >>>> # You will need to download or otherwise obtain some or all of the
> >>>> Amazon ASF Em
> >>>> ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566)
> >>>> to use this
> >>>> script.
> >>>> # To obtain a full copy you will need to launch an EC2 instance and
> >>>> mount the da
> >>>> taset to download it, otherwise you can get a sample of it at
> >>>> #
> >>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
> >>>>
> >>>> It looks like the:
> >>>>
> >>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
> >>>>
> >>>> link is down.
> >>>>
> >>>> Is there somewhere else that we can get a subset of the ASF emails?
> >>>>
> >>>>
> >>>>
> >>>> Date: Tue, 21 Jan 2014 09:48:06 -0800
> >>>> > Subject: Re: MAHOUT 0.9 Release - New URL
> >>>> > From: andrew.musselman@gmail.com
> >>>> > To: dev@mahout.apache.org
> >>>> >
> >>>> > Sure thing; continuing to smoke test the other examples tonight
> >>>> >
> >>>> >
> >>>> > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <
> >>>> suneel_marthi@yahoo.com>wrote:
> >>>> >
> >>>> > > Thanks Andrew M., see that some of the example scripts need to be
> >>>> fixed as
> >>>> > > they still refer to the deprecated algorithms.
> >>>> > > See that the Streaming KMeans has failed for you as well.
> >>>> > >
> >>>> > > I'll be rolling back the release today to fix these issues.
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
> >>>> > > andrew.musselman@gmail.com> wrote:
> >>>> > >
> >>>> > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default
> >>>> 64-bit
> >>>> > > Linux AMI from tarball.
> >>>> > >
> >>>> > > All tests pass.
> >>>> > >
> >>>> > > *Output of examples:*
> >>>> > > *asf-email-examples.sh, run on mahout.apache.org
> >>>> > > <http://mahout.apache.org>:*
> >>>> > > *recommendations:*
> >>>> > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
> >>>> > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000 | less
> >>>> > > 1
> >>>> > >
> >>>> > >
> >>>> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
> >>>> > > 4
> >>>> > >
> >>>> > >
> >>>> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
> >>>> > > 6
> >>>> > >
> >>>> > >
> >>>> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
> >>>> > > 8
> >>>> > > [12758:1.0,19409:1.0,11112:1.0]
> >>>> > > 11
> >>>> > >
> >>>> > >
> >>>> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
> >>>> > > 14
> >>>> > >
> >>>> > >
> >>>> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
> >>>> > > 15
> >>>> > >
> >>>> > >
> >>>> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
> >>>> > > 16
> >>>> > >
> >>>> > >
> >>>> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
> >>>> > > 18
> >>>> > >
> >>>> > >
> >>>> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
> >>>> > > 19 [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
> >>>> > > 20
> >>>> > >
> >>>> > >
> >>>> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
> >>>> > > [snip]
> >>>> > >
> >>>> > > *clustering; kmeans:*
> >>>> > > [snip]
> >>>> > > Weight : [props - optional]: Point:
> >>>> > > 1.0 :
> >>>> > > [distance-squared=1.0193102046188427]:
> >>>> > > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus=
> >>>> > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
> >>>> 7573:0.204,
> >>>> > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093,
> >>>> 9779:0.159,
> >>>> > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
> >>>> > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
> >>>> > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
> >>>> > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
> >>>> > > 39789:0.110, 40743:0.190, 45775:0.086]
> >>>> > > 1.0 : [distance-squared=0.9823018320457279]:
> >>>> > > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus=
> >>>> > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
> >>>> 5336:0.106,
> >>>> > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173,
> >>>> 7832:0.072,
> >>>> > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
> >>>> > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
> >>>> > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
> >>>> > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
> >>>> > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
> >>>> > > 1.0 : [distance-squared=0.9509142993214911]:
> >>>> > >
> >>>> /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
> >>>> > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
> >>>> > > 4419:0.076,
> >>>> > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056,
> >>>> 7235:0.048,
> >>>> > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123,
> >>>> 7683:0.077,
> >>>> > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
> >>>> 10225:0.081,
> >>>> > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
> >>>> > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
> >>>> > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
> >>>> > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
> >>>> > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
> >>>> > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
> >>>> > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
> >>>> > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
> >>>> > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
> >>>> > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
> >>>> > > 41280:0.065, 41696:0.072, 41947:0.118,
> >>>> > > 43685:0.086, 44077:0.308,
> >>>> > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
> >>>> > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
> >>>> > > [snip]
> >>>> > >
> >>>> > > *clustering; dirichlet:*
> >>>> > > Get this complaint:
> >>>> > > Running Dirichlet with K = 8
> >>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> >>>> > > HADOOP_CONF_DIR=
> >>>> > > MAHOUT-JOB:
> >>>> > >
> >>>> > >
> >>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> >>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class:
> >>>> dirichlet
> >>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props
> >>>> found on
> >>>> > > classpath, will use command-line arguments only
> >>>> > > Unknown program 'dirichlet' chosen.
> >>>> > >
> >>>> > > *clustering: minhash:*
> >>>> > > Running Minhash
> >>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> >>>> > > HADOOP_CONF_DIR=
> >>>> > > MAHOUT-JOB:
> >>>> > >
> >>>> > >
> >>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> >>>> > > 14/01/21 05:17:27 WARN
> >>>> > > driver.MahoutDriver: Unable to add class: minhash
> >>>> > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found
> >>>> on
> >>>> > > classpath, will use command-line arguments only
> >>>> > > Unknown program 'minhash' chosen.
> >>>> > >
> >>>> > > *classification; standard:*
> >>>> > > =======================================================
> >>>> > > Summary
> >>>> > > -------------------------------------------------------
> >>>> > > Correctly Classified Instances : 5384 87.7874%
> >>>> > > Incorrectly Classified Instances : 749 12.2126%
> >>>> > > Total Classified Instances : 6133
> >>>> > >
> >>>> > > =======================================================
> >>>> > > Confusion Matrix
> >>>> > > -------------------------------------------------------
> >>>> > > a b c d
> >>>> > > <--Classified as
> >>>> > > 2949 7 531 25 | 3512 a = dev
> >>>> > > 0 0 0 0 | 0 b = general
> >>>> > > 99 8 1763 8 | 1878 c = user
> >>>> > > 41 1 29 672 | 743 d = commits
> >>>> > >
> >>>> > > =======================================================
> >>>> > > Statistics
> >>>> > > -------------------------------------------------------
> >>>> > > Kappa
> >>>> > > 0.7877
> >>>> > > Accuracy 87.7874%
> >>>> > > Reliability 53.658%
> >>>> > > Reliability (standard deviation) 0.4911
> >>>> > >
> >>>> > > *classification; complementary:*
> >>>> > > =======================================================
> >>>> > > Summary
> >>>> > > -------------------------------------------------------
> >>>> > > Correctly Classified Instances : 5530 90.1679%
> >>>> > > Incorrectly Classified Instances : 603 9.8321%
> >>>> > > Total Classified Instances :
> >>>> > > 6133
> >>>> > >
> >>>> > > =======================================================
> >>>> > > Confusion Matrix
> >>>> > > -------------------------------------------------------
> >>>> > > a b c d <--Classified as
> >>>> > > 3168 0 276 68 | 3512 a = dev
> >>>> > > 0 0 0 0 | 0 b = general
> >>>> > > 196 0 1652 30 | 1878 c = user
> >>>> > > 25 0 8 710 | 743 d =
> >>>> > > commits
> >>>> > >
> >>>> > > =======================================================
> >>>> > > Statistics
> >>>> > > -------------------------------------------------------
> >>>> > > Kappa 0.8259
> >>>> > > Accuracy 90.1679%
> >>>> > > Reliability 54.7459%
> >>>> > > Reliability (standard deviation) 0.5005
> >>>> > >
> >>>> > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms
> >>>> (Minutes:
> >>>> > > 0.34836666666666666)
> >>>> > >
> >>>> > > *classification; sgd, with three categories:*
> >>>> > > Running SGD Training
> >>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
> >>>> > > and
> >>>> > > HADOOP_CONF_DIR=
> >>>> > > MAHOUT-JOB:
> >>>> > >
> >>>> > >
> >>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> >>>> > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
> >>>> > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
> >>>> classpath,
> >>>> > > will use command-line arguments only
> >>>> > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
> >>>> > > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
> >>>> > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
> >>>> > > --output=[asf-output/classification/sgd/models], --poolSize=[5],
> >>>> > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
> >>>> > > 24168 training files
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 1
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> >>>> > > 2
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 3
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 4
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 6
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 8
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 10
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00
> >>>> > > 0.00 0.00 0.0000000 0.0000000 12
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 15
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 20
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 25
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 30
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000
> >>>> > > 0.0000000 40
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 50
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 60
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 70
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 80
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 100
> >>>> > > 0.000
> >>>> > > 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 120
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 140
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 150
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 200
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 250
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00
> >>>> > > 0.00 0.00 0.0000000 0.0000000 300
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 400
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 500
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 600
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 700
> >>>> > > 0.000 0.00 none
> >>>> > > 0.00 0.00 0.00 0.00 0.0000000
> >>>> > > 0.0000000 800
> >>>> > > 0.000 0.00 none
> >>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> >>>> > > 1.0019413e-08 1000 -0.607 75.78 none
> >>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> >>>> > > 1.0019413e-08 1200 -0.607 75.78 none
> >>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> >>>> > > 1.0019413e-08 1400 -0.607 75.78 none
> >>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> >>>> > > 1.0019413e-08 1500 -0.607 75.78 none
> >>>> > > 0.24 43686.00 17924.00 329.50
> >>>> > > 1.0571799e-08
> >>>> > > 1.0032261e-08 2000 -0.487 82.65 none
> >>>> > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
> >>>> > > 1.0011902e-08 2500 -0.439 83.90 none
> >>>> > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
> >>>> > > 1.0011902e-08 3000 -0.439 83.90 none
> >>>> > > 0.32 50635.00 28531.00 437.09 1.0551175e-08
> >>>> > > 1.0000001e-08 4000 -0.351 88.14 none
> >>>> > > 0.32 50635.00 32642.00 437.09 1.0551175e-08
> >>>> > > 1.0000000e-08 5000 -0.378 87.10 none
> >>>> > > 0.32 50635.00 36461.00 437.09
> >>>> > > 1.0556652e-08
> >>>> > > 1.0000001e-08 6000 -0.372 86.89 none
> >>>> > > 0.32 50635.00 37768.00 437.09 1.0576742e-08
> >>>> > > 1.0000001e-08 7000 -0.334 89.26 none
> >>>> > > 0.32 50635.00 38807.00 437.09 1.0576742e-08
> >>>> > > 1.0000000e-08 8000 -0.368 87.52 none
> >>>> > > 0.32 50635.00 44731.00 437.09 1.0576716e-08
> >>>> > > 1.0000000e-08 10000 -0.374 87.39 none
> >>>> > > 0.32 50635.00 45672.00 437.09 1.0576716e-08
> >>>> > > 1.0000000e-08 12000 -0.298 88.26 none
> >>>> > > Exception in thread "main" java.lang.IllegalStateException:
> >>>> > > java.lang.ArrayIndexOutOfBoundsException:
> >>>> > > 2
> >>>> > > at
> >>>> > >
> >>>> > >
> >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
> >>>> > > at
> >>>> > >
> >>>> > >
> >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
> >>>> > > at
> >>>> > >
> >>>> > >
> >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
> >>>> > > at
> >>>> > >
> >>>> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
> >>>> > > at
> >>>> > >
> >>>> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
> >>>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> >>>> Method)
> >>>> > > at
> >>>> > >
> >>>> > >
> >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >>>> > >
> >>>> > > at
> >>>> > >
> >>>> > >
> >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >>>> > > at java.lang.reflect.Method.invoke(Method.java:622)
> >>>> > > at
> >>>> > >
> >>>> > >
> >>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> >>>> > > at
> >>>> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> >>>> > > at
> >>>> > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> >>>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> >>>> Method)
> >>>> > > at
> >>>> > >
> >>>> > >
> >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >>>> > > at
> >>>> > >
> >>>> > >
> >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >>>> > > at java.lang.reflect.Method.invoke(Method.java:622)
> >>>> > > at
> >>>> > > org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> >>>> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
> >>>> > > at
> >>>> > > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
> >>>> > > at
> >>>> > >
> >>>> > >
> >>>> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
> >>>> > > at
> >>>> > >
> >>>> > >
> >>>> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
> >>>> > > at
> >>>> > >
> >>>> > >
> >>>> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
> >>>> > > at
> >>>> > >
> >>>> > >
> >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
> >>>> > > at
> >>>> > >
> >>>> > >
> >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
> >>>> > >
> >>>> > > at
> >>>> > >
> >>>> > >
> >>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
> >>>> > > at
> >>>> > >
> >>>> > >
> >>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
> >>>> > > at
> >>>> > >
> >>>> > >
> >>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
> >>>> > > at
> >>>> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >>>> > > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >>>> > > at
> >>>> > >
> >>>> > >
> >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> >>>> > > at
> >>>> > >
> >>>> > >
> >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>>> > > at java.lang.Thread.run(Thread.java:701)
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
> >>>> > > andrew.musselman@gmail.com> wrote:
> >>>> > >
> >>>> > > > Trying out the build today
> >>>> > > >
> >>>> > > >
> >>>> > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
> >>>> suneel_marthi@yahoo.com
> >>>> > > >wrote:
> >>>> > > >
> >>>> > > >> This is an issue (trivial one though) that needs to be fixed for
> >>>> 0.9
> >>>> > > >> Release, will be rerolling the release today (in the next few
> >>>> hrs) and
> >>>> > > >> putting out a new release candidate in staging.
> >>>> > > >>
> >>>> > > >> Thanks for reporting this Andrew P.
> >>>> > > >>
> >>>> > > >>
> >>>> > > >>
> >>>> > > >>
> >>>> > > >>
> >>>> > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
> >>>> > > ap.dev@outlook.com>
> >>>> > > >> wrote:
> >>>> > > >>
> >>>> > > >> I ran through the tests with on a CentOS VM
> >>>> > > AMD64 2 cores 4 GB RAM. Had
> >>>> > > >> a bit of trouble getting the Hadoop natives to compile and
> >>>> therefore may
> >>>> > > >> have run into some problems because of the hadoop setup. Ran
> >>>> into some
> >>>> > > >> problems in the example scripts. Particularly with
> >>>> > > >> ./cluster-syntheticcontrol.sh ->4,5. I will run through the
> >>>> rest of the
> >>>> > > >> examples when im sure I've got hadoop setup right.
> >>>> > > >>
> >>>> > > >>
> >>>> > > >> Apache Maven 3.1.2-SNAPSHOT
> >>>> > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> >>>> > > >> Java home: /usr/java/jdk1.6.0_45/jre
> >>>> > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch:
> >>>> "amd64",
> >>>> > > >> family: "unix"
> >>>> > > >> $MAHOUT_LOCAL=true
> >>>> > > >> Hadoop 2.2.0
> >>>> > > >>
> >>>> > > >>
> >>>> > > >> a) Verify that u can unpack the release (tar or zip) ...passed
> >>>> (tar)
> >>>> > > >> [passed ]
> >>>> > > >>
> >>>> > > >> b) Verify u r able to compile the
> >>>> > > distro
> >>>> > > >>
> >>>> > > >> mvn compile- [passed with warnings]
> >>>> > > >>
> >>>> > > >> [WARNING] Expected all dependencies to require Scala
> >>>> version: 2.9.3
> >>>> > > >> [WARNING] org.apache.mahout:mahout-math-scala:0.9 requires
> >>>> scala
> >>>> > > >> version: 2.9.3
> >>>> > > >> [WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala
> >>>> > > >> version: 2.9.2
> >>>> > > >> [WARNING] Multiple versions of scala libraries detected!
> >>>> > > >>
> >>>> > > >> c) Run through the unit tests: mvn clean test
> >>>> > > >> mvn clean test [passed]
> >>>> > > >>
> >>>> > > >> d) Run the
> >>>> > > >> example scripts under $MAHOUT_HOME/examples/bin.
> >>>> > > >> Please run through all the different options in each script
> >>>> > > >>
> >>>> > > >> Running example scripts with $MAHOUT_LOCAL=true
> >>>> > > >>
> >>>> > > >>
> >>>> > > ./cluster-syntheticcontrol.sh ->1 [works]
> >>>> > > >> ./cluster-syntheticcontrol.sh ->2 [works]
> >>>> > > >> ./cluster-syntheticcontrol.sh ->3 [works]
> >>>> > > >>
> >>>> > > >>
> >>>> > > >> ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
> >>>> > > >> [...]
> >>>> > > >> WARNING: Unable to add class:
> >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> >>>> > > >> java.lang.ClassNotFoundException:
> >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> >>>> > > >> at
> >>>> > > >> java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> >>>> > > >> at java.security.AccessController.doPrivileged(Native
> >>>> Method)
> >>>> > > >> at
> >>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> >>>> > > >> at
> >>>> > > java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> >>>> > > >> at
> >>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> >>>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> >>>> > > >> at java.lang.Class.forName0(Native Method)
> >>>> > > >> at java.lang.Class.forName(Class.java:171)
> >>>> > > >> at
> >>>> > > >>
> >>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> >>>> > > >> at
> >>>> > > >>
> >>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> >>>> > > >> Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
> >>>> > > >>
> >>>> > > >>
> >>>> > > >> ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
> >>>> > > >>
> >>>> > > >> WARNING: Unable to add class:
> >>>> > > >>
> >>>> > > org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> >>>> > > >> java.lang.ClassNotFoundException:
> >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> >>>> > > >> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> >>>> > > >> at java.security.AccessController.doPrivileged(Native
> >>>> Method)
> >>>> > > >> at
> >>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> >>>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> >>>> > > >> at
> >>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> >>>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> >>>> > > >> at java.lang.Class.forName0(Native Method)
> >>>> > > >> at
> >>>> > > java.lang.Class.forName(Class.java:171)
> >>>> > > >> at
> >>>> > > >>
> >>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> >>>> > > >> at
> >>>> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> >>>> > > >> Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
> >>>> > > >> WARNING: No
> >>>> > > >>
> >>>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found
> >>>> > > on
> >>>> > > >> classpath, will use command-line arguments only
> >>>> > > >> Unknown program
> >>>> > > >> 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
> >>>> chosen.
> >>>> > > >>
> >>>> > > >>
> >>>> > > >> ./classify-20newsgroups.sh ->1 [works]
> >>>> > > >> ./classify-20newsgroups.sh ->2 [works]
> >>>> > > >>
> >>>> > > >>
> >>>> > > >> cluster-reuters.sh ->1 [works]
> >>>> > > >>
> >>>> > > cluster-reuters.sh ->2 [works]
> >>>> > > >> cluster-reuters.sh ->3 [works]
> >>>> > > >>
> >>>> > > >> Same error as noted previosly in the thread:
> >>>> > > >>
> >>>> > > >> cluster-reuters.sh ->4 [0 clusters]
> >>>> > > >>
> >>>> > > >> [...]
> >>>> > > >>
> >>>> > > >> WARNING: No qualcluster.props found on classpath, will use
> >>>> > > >> command-line arguments only
> >>>> > > >> Num clusters: 0; maxDistance: 0.000000
> >>>> > > >> [Dunn Index]
> >>>> > > >> First: Infinity
> >>>> > > >> [Davies-Bouldin Index] First: NaN
> >>>> > > >> Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
> >>>> > > >> INFO: Program took 669 ms (Minutes: 0.01115)
> >>>> > > >> cluster,distance.mean,distance.sd
> >>>> > > >>
> >>>> > >
> >>>> > >
> >>>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> >>>> > > >>
> >>>> > > >>
> >>>> > > >>
> >>>> > > >>
> >>>> > > >>
> >>>> > > >>
> >>>> > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
> >>>> > > >> > From: suneel_marthi@yahoo.com
> >>>> > > >> > Subject: MAHOUT 0.9 Release - New URL
> >>>> > > >> > To: user@mahout.apache.org; dev@mahout.apache.org
> >>>> > > >> >
> >>>> > > >> > Third time's a Charm!!!
> >>>> > > >> >
> >>>> > > >> >
> >>>> > > >> > Here's the new URL for Mahout 0.9 Release:
> >>>> > > >> >
> >>>> > > >>
> >>>> > >
> >>>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> >>>> > > >> >
> >>>> > > >> > For those volunteering to test this, some of the things to be
> >>>> > > verified:
> >>>> > > >> >
> >>>> > > >> > a) Verify that u can unpack the release (tar or zip)
> >>>> > > >> > b) Verify u r able to compile the distro
> >>>> > > >> > c) Run through the unit tests: mvn clean test
> >>>> > > >> > d) Run the example scripts
> >>>> > > >> under $MAHOUT_HOME/examples/bin. Please run through all the
> >>>> different
> >>>> > > >> options in each script.
> >>>> > > >> >
> >>>> > > >> >
> >>>> > > >> > Committers
> >>>> > > >> > and PMC members:
> >>>> > > >> > ---------------------------------------
> >>>> > > >> >
> >>>> > > >> > Need 'at least 3 +1 votes' for the Release to pass.
> >>>> > > >> >
> >>>> > > >> >
> >>>> > > >> > Thanks and
> >>>> > > Regards.
> >>>> > > >>
> >>>> > > >
> >>>> > > >
> >>>> > >
> >>>>
> >>>
> >>>
> >>
> >
Re: MAHOUT 0.9 Release - New URL
Posted by Suneel Marthi <su...@yahoo.com>.
Andrew M., Andrew P. and others,
Sebastian and me fixed a few issues today (for 0.9):
a) Removed asf-email-examples.sh script and few other scripts that should have been removed. Also removed references/invocations to algorithms that have been removed from the codebase.
b) Fixed the issue with Streaming Kmeans clustering and checked in the code.
c) Resurrected Frequent Pattern Mining implementation for 0.9.
Please checkout the latest code from trunk, run a build locally and run thru the example scripts.
Thanks and Regards.
On Wednesday, January 22, 2014 12:11 AM, Andrew Musselman <an...@gmail.com> wrote:
*factorize-movielens-1M.sh:*
RMSE is:
0.8519064098265133
Sample recommendations:
2229
[2197:4.4961276,527:4.4309845,2972:4.4180074,318:4.379484,572:4.312696,3092:4.249903]
5848
[1900:4.6775646,3787:4.6623707,632:4.641377,2609:4.608225,3808:4.6058936,2998:4.6057487]
3728
[572:4.951382,47:4.748921,874:4.6945343,1563:4.679901,3314:4.6621537,50:4.655838]
1252
[53:5.0,3816:4.9664702,3077:4.9494777,213:4.94007,3808:4.9060082,978:4.8568053]
634
[572:5.0,3092:4.779557,1872:4.72024,2687:4.629712,2125:4.615142,3853:4.5261393]
5516 [572:5.0,2197:5.0,3092:5.0,318:4.908213,356:4.885,3844:4.8237453]
2276 [1204:5.0,572:5.0,912:5.0,1250:5.0,1272:4.999891,1262:4.989652]
4219
[53:4.8112006,598:4.775032,858:4.761604,572:4.7579737,1219:4.680987,1221:4.6604886]
91 [1198:5.0,2762:5.0,1207:5.0,1234:5.0,318:5.0,260:5.0]
502
[953:5.0,260:4.9800477,1234:4.869403,1198:4.8527064,1207:4.8497486,3469:4.847286]
factorize-netflix.sh:
References a no-longer-available data set that Netflix took down after the
competition; should at least mention that the data set is no longer
"online" at least.
On Tue, Jan 21, 2014 at 8:05 PM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:
> *clustering-syntheticcontrol.sh*
>
> *Canopy:*
> [snip]
> 1.0 : [distance-squared=1740.681000315628]: [35.486, 25.600,
> 29.914, 30.200, 27.654, 28.647, 26.582, 32.941, 34.728, 29.047, 34.706,
> 34.764, 28.816, 30.271, 25.784, 25.035, 35.436, 29.075, 24.267, 24.625,
> 28.162, 26.218, 28.773, 28.966, 28.802, 34.292, 26.211, 33.363, 32.920,
> 31.891, 34.504, 32.686, 24.327, 35.981, 31.390, 10.832, 20.238, 10.051,
> 14.877, 10.570, 19.603, 14.544, 10.667, 16.470, 19.007, 10.352, 13.473,
> 12.196, 10.684, 16.620, 20.434, 17.069, 18.744, 9.599, 11.195, 12.002,
> 10.017, 17.149, 14.850, 10.890]
> 1.0 : [distance-squared=1455.363773097357]: [31.022, 28.140,
> 26.730, 26.570, 29.561, 26.966, 28.049, 25.673, 33.721, 26.275, 30.410,
> 31.101, 24.019, 35.659, 25.253, 25.932, 28.618, 32.423, 33.666, 33.745,
> 35.118, 29.164, 25.477, 31.947, 35.491, 30.730, 25.820, 24.651, 25.528,
> 31.343, 29.005, 31.825, 26.891, 28.194, 31.429, 16.935, 8.070, 16.604,
> 14.743, 10.342, 8.155, 10.395, 17.689, 16.791, 14.138, 15.761, 6.787,
> 13.062, 16.660, 15.021, 9.891, 9.216, 11.550, 8.877, 18.220, 9.477, 10.342,
> 16.430, 11.898, 15.366]
> 1.0 : [distance-squared=1679.9304895378882]: [29.625, 25.503,
> 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> 15.285, 22.528, 20.657, 24.129]
> 1.0 : [distance-squared=2044.2887801683828]: [27.414, 25.397,
> 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> 20.229, 11.131, 9.980, 10.720]
> 1.0 : [distance-squared=1385.3154063160764]: [35.899, 26.672,
> 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> 16.546, 15.927, 18.084, 17.475]
> 1.0 : [distance-squared=1920.6376615603585]: [24.538, 24.280,
> 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> 19.310, 12.999, 17.460]
> 1.0 : [distance-squared=2192.939571172661]: [34.335, 30.938,
> 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> 11.743, 11.699, 10.152]
> Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Wrote 6 clusters
> Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 5827 ms (Minutes: 0.09711666666666667)
>
> *K-means:*
> [snip]
> 1.0 : [distance-squared=2873.881301031739]: [26.369, 37.791,
> 41.839, 39.694, 36.728, 35.079, 30.668, 24.755, 20.610, 18.885, 15.459,
> 14.074, 19.117, 34.230, 32.202, 40.715, 39.543, 37.193, 39.448, 30.829,
> 25.273, 22.324, 19.801, 13.505, 17.462, 24.287, 27.398, 38.577, 42.108,
> 41.947, 43.987, 41.331, 28.958, 23.664, 20.308, 22.244, 12.149, 15.768,
> 18.964, 32.579, 33.202, 36.205, 42.364, 40.601, 35.415, 39.576, 33.145,
> 19.936, 23.062, 19.053, 24.383, 19.611, 25.218, 38.860, 36.570, 38.964,
> 36.946, 36.900, 32.593, 31.563]
> 1.0 : [distance-squared=2525.8924241648783]: [35.389, 31.178,
> 40.041, 43.034, 49.524, 40.942, 42.369, 30.153, 26.253, 23.178, 19.885,
> 19.693, 21.837, 26.858, 23.533, 29.798, 43.401, 43.349, 48.238, 43.868,
> 44.017, 28.056, 25.284, 22.737, 13.703, 14.023, 22.115, 23.720, 25.478,
> 37.870, 37.868, 46.954, 43.135, 41.286, 37.269, 29.347, 24.312, 21.743,
> 19.152, 21.668, 10.882, 16.147, 30.020, 28.472, 38.716, 44.620, 47.847,
> 48.846, 39.361, 38.449, 33.105, 19.935, 14.961, 12.149, 12.630, 13.459,
> 18.967, 35.473, 30.146, 45.527]
> 1.0 : [distance-squared=2392.7171990886272]: [27.662, 37.199,
> 39.158, 44.264, 46.473, 40.178, 38.728, 24.412, 25.155, 15.938, 13.125,
> 16.867, 13.875, 29.969, 34.322, 40.870, 44.225, 47.594, 47.607, 44.751,
> 39.813, 32.461, 16.646, 16.305, 21.256, 20.627, 19.267, 31.901, 34.995,
> 43.643, 43.152, 47.125, 48.524, 33.131, 32.697, 20.300, 14.350, 18.330,
> 12.155, 11.261, 19.701, 29.313, 30.457, 39.914, 49.543, 49.851, 46.445,
> 39.561, 28.860, 22.486, 13.967, 10.006, 11.715, 15.945, 21.348, 25.439,
> 41.024, 37.105, 45.623, 43.589]
> 1.0 : [distance-squared=1419.8378244373016]: [25.784, 34.129,
> 42.659, 37.176, 35.961, 34.307, 32.108, 29.749, 25.047, 17.455, 24.652,
> 25.311, 22.995, 30.256, 25.955, 28.426, 34.556, 40.386, 39.642, 40.566,
> 32.612, 34.091, 26.033, 25.668, 26.545, 17.338, 24.980, 20.134, 27.542,
> 36.612, 31.855, 37.947, 39.736, 33.535, 36.607, 37.479, 32.612, 22.262,
> 20.662, 16.124, 24.547, 27.686, 21.747, 27.198, 31.259, 40.569, 37.067,
> 34.465, 34.730, 33.371, 23.060, 30.162, 22.022, 22.216, 14.812, 19.357,
> 24.508, 34.432, 32.155, 34.839]
> 1.0 : [distance-squared=4186.814512311335]: [25.870, 39.195,
> 36.908, 47.052, 47.384, 40.741, 42.494, 30.282, 25.834, 17.650, 16.004,
> 17.895, 13.321, 19.045, 27.440, 31.911, 39.208, 43.622, 41.567, 44.815,
> 44.921, 35.422, 35.477, 23.190, 17.859, 14.684, 23.504, 23.141, 21.746,
> 30.816, 31.361, 37.015, 38.094, 46.688, 47.681, 43.777, 39.652, 31.701,
> 23.767, 22.265, 22.654, 22.327, 19.195, 21.163, 29.602, 27.563, 36.244,
> 38.859, 44.234, 42.352, 42.160, 40.172, 30.094, 21.092, 25.193, 13.096,
> 18.111, 14.754, 27.386, 27.026]
> 1.0 : [distance-squared=1544.4011543572997]: [28.075, 41.784,
> 42.120, 38.735, 44.320, 34.316, 32.212, 31.868, 24.301, 14.547, 17.178,
> 22.279, 24.357, 31.011, 31.444, 34.837, 46.550, 48.301, 38.859, 42.363,
> 35.657, 31.499, 19.794, 12.124, 15.371, 12.436, 15.763, 24.679, 32.597,
> 43.004, 36.616, 38.935, 42.954, 34.957, 36.183, 28.177, 16.326, 14.988,
> 10.680, 22.728, 24.075, 24.058, 36.616, 43.982, 39.198, 40.118, 40.078,
> 34.752, 34.018, 23.750, 18.374, 12.251, 15.539, 18.699, 28.973, 24.044,
> 39.404, 38.034, 46.458, 44.432]
> 1.0 : [distance-squared=825.9338725427806]: [33.670, 38.675,
> 39.742, 41.989, 37.291, 43.975, 31.909, 25.878, 31.080, 15.858, 13.950,
> 23.097, 19.983, 21.692, 31.579, 38.570, 33.376, 38.843, 41.936, 33.534,
> 39.195, 32.897, 25.343, 18.523, 15.089, 17.771, 22.614, 25.313, 23.687,
> 29.010, 41.995, 35.712, 40.872, 41.669, 32.156, 25.162, 24.980, 23.705,
> 18.413, 20.975, 14.906, 26.171, 30.165, 27.818, 35.083, 39.514, 37.851,
> 33.967, 32.338, 34.977, 26.589, 28.079, 19.597, 24.669, 23.098, 25.685,
> 28.215, 34.940, 36.910, 39.749]
> Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Wrote 6 clusters
> Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 16902 ms (Minutes: 0.2817)
>
> *Fuzzy k-means:*
> [snip]
> 1.0 : [distance-squared=971.7369782121968]: [29.625, 25.503,
> 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> 15.285, 22.528, 20.657, 24.129]
> 1.0 : [distance-squared=2054.618163154475]: [27.414, 25.397,
> 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> 20.229, 11.131, 9.980, 10.720]
> 1.0 : [distance-squared=954.6503560728597]: [35.899, 26.672,
> 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> 16.546, 15.927, 18.084, 17.475]
> 1.0 : [distance-squared=2817.9170498632957]: [24.538, 24.280,
> 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> 19.310, 12.999, 17.460]
> 1.0 : [distance-squared=3472.3684696871424]: [34.335, 30.938,
> 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> 11.743, 11.699, 10.152]
> Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Wrote 6 clusters
> Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 106615 ms (Minutes: 1.7769166666666667)
>
> *Dirichlet and Meanshift:*
> Already detailed in M-1400, deprecated jobs still referenced.
>
>
>
> On Tue, Jan 21, 2014 at 6:20 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
>> *cluster-reuters.sh*
>> *k-means:*
>>
>> [snip]
>> :VL-19482{n=913 c=[0.06:0.011, 0.1:0.007, 0.13:0.010, 0.25:0.016,
>> 0.38:0.020, 0.4:0.007, 0.5:0.032, 0
>> Top Terms:
>> banks =>
>> 3.841823268955143
>> bank =>
>> 3.80633066361209
>> debt =>
>> 3.28065219870794
>> said =>
>> 2.5965700942088583
>> he =>
>> 2.335682813857497
>> foreign =>
>> 2.2217853688201403
>> billion =>
>> 2.1970193848291335
>> would =>
>> 1.9932392063955617
>> loans =>
>> 1.9309276792854233
>> interest =>
>> 1.787324501938
>> have =>
>> 1.762981951432578
>> its =>
>> 1.7615109954971866
>> which =>
>> 1.5822081148036862
>> has =>
>> 1.5600708189041956
>> dlrs =>
>> 1.5571038313005996
>> finance =>
>> 1.5539758811252924
>> new =>
>> 1.5176015811577555
>> had =>
>> 1.5138723701401844
>> brazil =>
>> 1.5083369853593172
>> payments =>
>> 1.4539044255886517
>> Weight : [props - optional]: Point:
>>
>> :VL-7320{n=2726 c=[0:0.003, 0.1:0.010, 0.2:0.007, 0.3:0.009, 0.4:0.007,
>> 0.40:0.003, 0.5:0.009, 0.57:0
>> Top Terms:
>> vs =>
>> 6.126130791333171
>> net =>
>> 4.012191567277523
>> cts =>
>> 3.822006848832744
>> shr =>
>> 3.6786004856764527
>> mln =>
>> 2.9011643584038698
>> loss =>
>> 2.788368861463607
>> qtr =>
>> 2.714140225051522
>> revs =>
>> 2.4739861236454717
>> profit =>
>> 1.8146888090247015
>> note =>
>> 1.7977163272138388
>> dlrs =>
>> 1.6164390808155846
>> avg =>
>> 1.3901765773336587
>> shrs =>
>> 1.3856326531419314
>> mths =>
>> 1.3168717272038506
>> 4th =>
>> 1.2161158425617289
>> oper =>
>> 1.182419473776814
>> year =>
>> 1.178086061733047
>> nine =>
>> 1.0670554836445316
>> 3rd =>
>> 1.041334410056592
>> inc =>
>> 1.0019361981554935
>> Weight : [props - optional]: Point:
>>
>>
>> Inter-Cluster Density: 0.45562152681859414
>> Intra-Cluster Density: 0.6952712632167628
>> CDbw Inter-Cluster Density: 0.0
>> CDbw Intra-Cluster Density: 16.486930227598684
>> CDbw Separation: 194.49005884464628
>>
>> *fuzzy k-means:*
>> :SV-18539{n=1039 c=[0:0.026, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
>> 0.01:0.005, 0.02:0.002, 0.0
>> Top Terms:
>> said =>
>> 1.8665592354713065
>> its =>
>> 1.1335212213411592
>> pct =>
>> 1.0862816801353348
>> dlrs =>
>> 1.0854998884993752
>> mln =>
>> 1.043163996400643
>> from =>
>> 0.9684961110525736
>> has =>
>> 0.912161511978058
>> company =>
>> 0.8754186972808333
>> mar =>
>> 0.8675333452422878
>> inc =>
>> 0.7678617590362815
>> would =>
>> 0.7610968883652675
>> he =>
>> 0.7459988770503974
>> which =>
>> 0.7435613119406804
>> year =>
>> 0.7302840632748394
>> u.s =>
>> 0.7281061062439116
>> shares =>
>> 0.7260764102983083
>> corp =>
>> 0.7179807367808658
>> new =>
>> 0.7044203783157115
>> stock =>
>> 0.6962010978721442
>> have =>
>> 0.6464265467298506
>> :SV-9431{n=1034 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
>> 0.01:0.004, 0.02:0.002, 0.02
>> Top Terms:
>> said =>
>> 1.864911184196927
>> dlrs =>
>> 1.199286689822081
>> mln =>
>> 1.1802134783562215
>> pct =>
>> 1.1529704214798124
>> its =>
>> 1.1184398851519701
>> from =>
>> 1.016647848050332
>> company =>
>> 0.894703604722841
>> mar =>
>> 0.879986159541356
>> has =>
>> 0.8642799128491316
>> year =>
>> 0.8271823503717782
>> inc =>
>> 0.7871293745341424
>> corp =>
>> 0.737705498468879
>> which =>
>> 0.722975201852743
>> would =>
>> 0.708000816484415
>> u.s =>
>> 0.7073294276173905
>> billion =>
>> 0.7055723996916351
>> he =>
>> 0.7042684217823294
>> new =>
>> 0.6834737905434939
>> shares =>
>> 0.6753327384172428
>> stock =>
>> 0.6576225144041699
>> :SV-4785{n=1044 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
>> 0.01:0.006, 0.02:0.002, 0.02
>> Top Terms:
>> said =>
>> 1.8796076179735086
>> its =>
>> 1.172025965452378
>> dlrs =>
>> 1.130422792460914
>> pct =>
>> 1.082038255241358
>> mln =>
>> 1.0772146872767114
>> company =>
>> 0.9662235879639138
>> from =>
>> 0.9473172871605616
>> has =>
>> 0.9224712965830099
>> mar =>
>> 0.8769325856924421
>> inc =>
>> 0.8360245257169788
>> shares =>
>> 0.8334595641384324
>> stock =>
>> 0.7704621839612175
>> corp =>
>> 0.7682400250301806
>> which =>
>> 0.7389988207856137
>> would =>
>> 0.7339708917389389
>> year =>
>> 0.7088414843731325
>> new =>
>> 0.7038109468655172
>> he =>
>> 0.6993994455501005
>> u.s =>
>> 0.6772649147622415
>> share =>
>> 0.6241804830055171
>>
>> *lda:*
>>
>> [snip]
>> 21539
>> {0.02:0.7071698119320297,0.01:0.09185661419250732,0.055:0.05559716236030052,0:0.04416624482186779,0.046:0.04110903741109505,0.10:0.036664417789754995,0.073:0.009543544335363714,0.006913:0.0050293341354450535,0.007050:0.004031353935388081,0.003:0.0019163134919350053}
>> 21540
>> {0.04:0.4289840457590362,0.006913:0.3764552842292379,0.03:0.14201700033942147,0.025:0.031492533562460345,0.01:0.0057428959027208,0.0625:0.005563615996288134,0.007050:0.004410121345864598,0.02:0.003315679992182833,0.06:0.0010680133665222197,0.057:2.6908116668663575E-4}
>> 21541
>> {0:0.6323790020346274,0.02:0.282038438551802,0.06:0.05929917592183956,0.046:0.021922159161082488,0.025:0.003123559949176154,0.1:6.163861804777446E-4,0.057:1.1513607281652563E-4,0.077:7.546641269535658E-5,0.05:6.020069105221075E-5,0.04:5.727546417602751E-5}
>> 21542
>> {0.06:0.7916202902993545,0.003:0.12474538653424426,0.03:0.0516187501990535,0.073:0.013078206873952192,0.077:0.006874558859394474,0.1:0.004726257215175674,0.07:0.0029954110394770084,0.02:0.0015535221634909381,0.0625:8.057802584933225E-4,0.006913:5.349608183182145E-4}
>> 21543
>> {0.03:0.8557703918728101,0.007050:0.0884696514495358,0.01:0.02386217742025169,0.055:0.01794750983209314,0.046:0.007244240662980594,0.02:0.0025021610305199344,0.1:0.0011795344808501283,0.0625:9.555718731181631E-4,0.077:8.384080940465597E-4,0.003:2.4480111447900804E-4}
>> 21544
>> {0.006913:0.6497351267772595,0.03:0.2586364130498913,0.003:0.0845308091084703,0.05:0.003488921831506818,0.0625:0.0011991059579690732,0.06:5.301659535652308E-4,0.055:3.152013670552699E-4,0.025:3.11949418681945E-4,0.02:2.2346509541410544E-4,0.057:1.727741085612565E-4}
>> 21545
>> {0.07:0.24937001680778675,0.03:0.22854680579732564,0.1:0.22068070811382484,0.05:0.21891358916881656,0.06:0.04971205462818302,0.046:0.029384231252419384,0.025:0.0014118858333346275,0.0625:0.0012321257282372393,0.073:3.1606402052550454E-4,0.04:1.1662985389192366E-4}
>> 21546
>> {0.006913:0.652435612545713,0.073:0.24282539253890825,0.003:0.055020663101050894,0.046:0.01978650831133704,0.04:0.011699600233998459,0.02:0.006822518565048025,0.05:0.0042733514779345234,0.025:0.0020768247329981497,0.03:0.0015466188524926666,0:0.0010433788491090747}
>> 21547
>> {0.1:0.935072008706917,0.077:0.040616382672055494,0.046:0.023138054104883225,0.06:3.42252302356326E-4,0.03:2.672247473681925E-4,0.007050:9.004419280457053E-5,0.006913:7.844271097106069E-5,0.04:7.198835322717653E-5,0.073:5.8017696474244885E-5,0.02:4.474316852571109E-5}
>> 21548
>> {0.006913:0.4510141547325999,0.03:0.21472666934984877,0.07:0.15060570743672352,0.046:0.08216181853028293,0.05:0.07498297963542139,0.077:0.01401215532342401,0.04:0.006135722806477439,0.073:0.0031124469556872442,0.02:0.001327252389650958,0.025:4.5167404646311124E-4}
>> 21549
>> {0.077:0.5249260290096315,0:0.27520186965742544,0.073:0.08959570186504386,0.057:0.05063809804337512,0.02:0.045313417578343,0.03:0.00856024297303885,0.06:0.0034039136814002697,0.07:8.611323331226122E-4,0.05:3.9387255234958607E-4,0.04:3.454752498776842E-4}
>> 21550
>> {0.077:0.8319708524327014,0.007050:0.16001430652201168,0.0625:0.0024884528530674226,0.03:0.0024228964094551416,0.006913:0.0010048446080994332,0.025:5.974607168723063E-4,0.06:2.7741593377792194E-4,0.057:2.2975316663437597E-4,0.073:2.1769059479546598E-4,0.01:1.4945234676270913E-4}
>> 21551
>> {0:0.43298549949426596,0.07:0.30407459462158964,0.0625:0.15744077685124136,0.01:0.06385977925647884,0.025:0.020205557109867888,0.04:0.013226123091680062,0.055:0.006501562964287696,0.073:5.871897881404074E-4,0.003:4.554836644848264E-4,0.05:2.0489216962221778E-4}
>> 21552
>> {0.06:0.7383744333942458,0.02:0.07321126682102753,0.07:0.05910378841288311,0.10:0.056907223730939045,0:0.02739559786902668,0.055:0.02228913751272657,0.1:0.00943274247398869,0.073:0.007301445750018608,0.03:0.0027711985062277246,0.046:0.0022569760697531112}
>> 21553
>> {0.006913:0.9383779606743132,0.07:0.018356074893823605,0.1:0.017572502072403694,0.025:0.011031848455352145,0.02:0.007110054905474798,0.10:0.0032390933761145377,0.01:0.0022676909091530165,0.06:6.35712654913703E-4,0.003:3.037608224814005E-4,0.0625:2.569989005512836E-4}
>> 21554
>> {0.02:0.4740260172915081,0.03:0.3264871353578654,0.003:0.12383071192450323,0:0.0387913487693229,0.046:0.013922363892934853,0.01:0.01115832101415319,0.055:0.004438529958216764,0.006913:0.0031112721201723413,0.025:0.002399946628810436,0.073:6.131643125523258E-4}
>> 21555
>> {0.01:0.42396888624053325,0.057:0.2876207419392007,0.055:0.13436747721404915,0.10:0.0923627978966975,0.05:0.030530372962547347,0.003:0.008404234144369286,0.006913:0.007605013189108045,0.007050:0.005108825532219537,0.0625:0.004651573366090424,0.025:0.002180181204608079}
>> 21556
>> {0.02:0.9634685201058687,0.077:0.02816791882994464,0.003:0.007368001305747207,0.01:4.6299664334060455E-4,0.055:6.358811991095171E-5,0.06:5.5240076581247115E-5,0.03:5.231350505975146E-5,0.073:4.4773553407989E-5,0:3.6810457882911134E-5,0.006913:3.469265861451538E-5}
>> 21557
>> {0.06:0.9993947051352264,0.05:2.53296407549323E-4,0.03:9.647254237540585E-5,0.04:4.3439270433017595E-5,0.006913:3.354681491698619E-5,0.046:2.8586727566416525E-5,0.007050:2.5584606074089293E-5,0.02:2.2477243875316502E-5,0:1.9194783598477564E-5,0.073:1.6900267481892075E-5}
>> 21558
>> {0.06:0.9985073788555696,0.03:8.033091100049725E-4,0.02:1.1953766739610202E-4,0.1:9.257610652171745E-5,0.006913:6.903783269939522E-5,0.04:5.7960967280609926E-5,0.07:5.5767050956214925E-5,0.046:4.857027337508731E-5,0.007050:3.5964741541364354E-5,0.057:3.538185773175377E-5}
>> 21559
>> {0.006913:0.5411158302162348,0.073:0.10595054605908563,0.04:0.10013413310674449,0.007050:0.08891690362990352,0.003:0.08028744789933502,0.03:0.07779866329563544,0.025:0.0024314950846438975,0.0625:0.0018111845757907532,0.077:5.019763638644379E-4,0.055:2.6705221435486376E-4}
>> 21560
>> {0.06:0.9978884964462115,0.03:9.139179655096763E-4,0.02:6.066525825847198E-4,0.04:3.2209695809936266E-4,0.006913:7.970816960964983E-5,0.007050:5.58984012266038E-5,0.046:1.545206468939752E-5,0.077:1.5319891034946036E-5,0:1.5223193391279898E-5,0.073:1.1687759018690983E-5}
>> 21561
>> {0.06:0.8704787441462888,0.007050:0.09131090501970876,0.0625:0.03778566273654969,0.046:8.111987924033587E-5,0.01:6.397000919080148E-5,0.1:4.979138875277178E-5,0.073:2.7942853854174412E-5,0.03:2.762357950066146E-5,0.077:2.6820742114533983E-5,0.025:2.6208487691114472E-5}
>> 21562
>> {0.06:0.9625603310705717,0.02:0.01708933716171641,0.007050:0.011374975845817934,0.07:0.00482114999912725,0.057:0.003310531318631415,0.077:2.568457462720504E-4,0.025:1.3439876933758153E-4,0.1:1.0512433283405881E-4,0:8.737443941062146E-5,0.046:5.9284232059258864E-5}
>> 21563
>> {0.06:0.9996809273982157,0.046:1.6325482120709976E-4,0.02:2.1084598024339765E-5,0.006913:1.7901242019979392E-5,0.04:1.3991162886383784E-5,0.03:1.3682157013015017E-5,0.077:1.3602492456590212E-5,0.007050:1.206712606141859E-5,0.1:1.0231842284246997E-5,0.057:7.704725064510759E-6}
>> 21564
>> {0.06:0.9997861963976675,0.03:2.7140899197995276E-5,0.077:2.4456158311423386E-5,0.04:1.7952818233846462E-5,0.02:1.6334455815684533E-5,0.006913:1.4910843270220926E-5,0.073:1.4893553937733922E-5,0.1:1.2545291899719683E-5,0.007050:1.2337617858874285E-5,0.07:1.121302251254011E-5}
>> 21565
>> {0.006913:0.5047832315246878,0.007050:0.2502505818382197,0.04:0.09937533960784072,0.03:0.05332716291468396,0.0625:0.035738976624857435,0.05:0.023139962103851885,0.1:0.01510786357969295,0.025:0.01294816540331917,0.06:0.003589347327961106,0.073:4.8607269023994543E-4}
>> 21566
>> {0.03:0.865579490292393,0.073:0.065013560785593,0.077:0.056622094108767465,0.046:0.006811842330071251,0.057:0.0021561477140846267,0.01:0.0013149375957061502,0.04:7.456782721333958E-4,0.05:6.412995789267404E-4,0:2.107021879325011E-4,0.007050:1.8121393989190674E-4}
>> 21567
>> {0.077:0.6528663315309344,0.03:0.2794089480653573,0.025:0.060214953606503134,0.003:0.0029851203917978303,0.01:0.002840784719750811,0.007050:5.508511345707982E-4,0.02:3.3395642786457786E-4,0.07:1.7452781529689483E-4,0.055:1.541344869853217E-4,0.046:1.0154945247629696E-4}
>> 21568
>> {0.057:0.7925855379581803,0.03:0.2036907633660934,0.06:0.002283924010657722,0.046:0.0011125177332923534,0.10:6.706442563331911E-5,0.02:3.9070227131596934E-5,0.07:3.266094677087569E-5,0.1:2.3436639268605713E-5,0.077:2.318067689954084E-5,0.006913:2.273173553155518E-5}
>> 21569
>> {0.06:0.7223525910216753,0.1:0.22280373045161775,0.04:0.04239924319412595,0.02:0.006529556700876843,0.007050:0.004661124794787862,0.10:2.017974034648702E-4,0.05:2.017516118028694E-4,0.025:1.583677755896652E-4,0.006913:1.1939663934259253E-4,0:8.736457986006156E-5}
>> 21570
>> {0.073:0.5321098304788365,0.006913:0.3897385574581158,0.02:0.051125703897749404,0.077:0.012417023254098358,0.01:0.006491518762040415,0.03:0.005434610908750246,0.055:9.352347087701305E-4,0.06:5.239363525283659E-4,0.10:4.4114444135088393E-4,0.003:2.2360302221231853E-4}
>> 21571
>> {0.06:0.9074255414695478,0.05:0.08971808812931319,0.02:0.0019124104766371694,0.1:2.581234320438502E-4,0.073:1.9041498001195312E-4,0.046:8.348355382104383E-5,0.006913:5.748062878632201E-5,0.057:4.8979196235823963E-5,0.04:4.4005272512340306E-5,0.03:4.1089625609562384E-5}
>> 21572
>> {0.06:0.7216888928389846,0.04:0.1955684645266304,0.006913:0.08235454447065854,0.1:9.82899034505965E-5,0.046:4.9184577303445956E-5,0.05:4.021965070167039E-5,0.007050:3.081280652634891E-5,0.073:2.791145099471127E-5,0.02:1.95681856192452E-5,0:1.9037190007395713E-5}
>> 21573
>> {0.05:0.8568971411565196,0.046:0.12909436895238377,0.06:0.012704234652048044,0.04:2.472784666357729E-4,0.055:2.1886698996361582E-4,0.1:1.1974451444757112E-4,0.0625:9.082357988309755E-5,0.07:9.030017229129562E-5,0.03:7.268045692763623E-5,0.073:6.607926928741721E-5}
>> 21574
>> {0.046:0.5619466458628039,0.006913:0.184782367089353,0.0625:0.09726566772972363,0.003:0.09534816862353344,0.02:0.015692927163565275,0.073:0.015492017672231727,0.01:0.01411218625979968,0.007050:0.01037341031640615,0.055:0.0020124341216292752,0:0.0013828685922332715}
>> 21575
>> {0.05:0.3167681189235041,0.06:0.2879333280436204,0.046:0.22584628506521745,0.003:0.15994241628395953,0.0625:0.006482994028630967,0.1:0.0016773104050919493,0.055:4.8725015996676173E-4,0.03:2.432893903382962E-4,0.025:1.6015914759364425E-4,0.006913:1.0021292427951807E-4}
>> 21576
>> {0.077:0.2407816981967022,0.003:0.18594668103110193,0.10:0.14951775492012523,0.0625:0.09241152906714677,0.007050:0.08997645163280943,0.057:0.07102865286733068,0.055:0.048046579920457584,0.05:0.03776387140040494,0.073:0.033355081179026046,0.006913:0.015980584385115525}
>> 21577
>> {0.06:0.9599074612361259,0.1:0.03694135499501186,0.04:0.0021941315048273186,0.057:4.7070027358666304E-4,0.02:1.8231606308803002E-4,0.0625:6.234942557920162E-5,0.006913:5.159011604129561E-5,0.046:3.9832812943124216E-5,0.007050:3.0380683715134534E-5,0.05:2.8920851352755496E-5}
>>
>>
>> *Streaming k-means:*
>>
>> [snip]
>> INFO: Number of Centroids: 0
>> Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
>> WARNING: job_local23982482_0001
>> java.lang.IllegalArgumentException: Must have nonzero number of training
>> and test vectors. Asked for %.1f %% of %d vectors for test
>> [10.000000149011612, 0]
>> at
>> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
>> at
>> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
>> at
>> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
>> at
>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
>> at
>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
>> at
>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
>> at
>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
>>
>> [snip]
>>
>> WARNING: No qualcluster.props found on classpath, will use command-line
>> arguments only
>> Num clusters: 0; maxDistance: 0.000000
>> [Dunn Index] First: Infinity
>> [Davies-Bouldin Index] First: NaN
>> Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Program took 535 ms (Minutes: 0.008916666666666666)
>> cluster,distance.mean,distance.sd
>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>>
>>
>> On Tue, Jan 21, 2014 at 1:47 PM, Andrew Musselman <
>> andrew.musselman@gmail.com> wrote:
>>
>>> *classify-20newsgroups.sh*
>>>
>>> *Complementary naive bayes:*
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances : 11207 98.9406%
>>> Incorrectly Classified Instances : 120 1.0594%
>>> Total Classified Instances : 11327
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a b c d e f g h i
>>> j k l m n o p q r s
>>> t <--Classified as
>>> 475 0 0 1 0 0 0 0 0
>>> 0 0 0 0 0 1 0 1 0 0
>>> 0 | 478 a = alt.atheism
>>> 0 597 1 1 0 1 1 0 0
>>> 0 0 1 0 2 1 0 0 0 0
>>> 0 | 605 b = comp.graphics
>>> 0 1 620 3 0 1 0 0 0
>>> 0 0 1 0 0 1 0 0 0 0
>>> 0 | 627 c = comp.os.ms-windows.misc
>>> 1 1 1 593 2 0 0 0 0
>>> 0 0 0 0 0 0 1 0 0 0
>>> 0 | 599 d = comp.sys.ibm.pc.hardware
>>> 0 1 1 0 568 0 1 0 0
>>> 0 1 1 2 0 0 0 0 1 0
>>> 0 | 576 e = comp.sys.mac.hardware
>>> 0 4 2 0 0 581 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0
>>> 0 | 587 f = comp.windows.x
>>> 0 0 0 1 2 0 571 3 0
>>> 0 1 1 4 1 0 0 0 0 0
>>> 0 | 584 g = misc.forsale
>>> 0 0 0 1 0 0 0 589 1
>>> 0 0 1 1 0 0 0 0 0 0
>>> 0 | 593 h = rec.autos
>>> 0 0 0 0 0 0 0 1 565
>>> 0 0 0 0 0 1 0 0 0 0
>>> 0 | 567 i = rec.motorcycles
>>> 0 0 0 0 0 0 0 0 0
>>> 600 2 0 0 0 1 0 0 0 0
>>> 0 | 603 j = rec.sport.baseball
>>> 0 0 0 0 0 0 0 0 0
>>> 1 584 0 0 0 0 0 0 0 0
>>> 0 | 585 k = rec.sport.hockey
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 579 0 0 0 0 0 1 0
>>> 0 | 580 l = sci.crypt
>>> 0 0 0 1 3 0 2 0 0
>>> 2 0 0 567 1 2 1 0 0 0
>>> 0 | 579 m = sci.electronics
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 0 1 605 0 0 0 0 0
>>> 0 | 606 n = sci.med
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 602 0 0 0 0
>>> 0 | 602 o = sci.space
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 1 0 602 0 0 1
>>> 0 | 604 p = soc.religion.christian
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 556 0 0
>>> 0 | 556 q = talk.politics.mideast
>>> 0 0 1 0 0 0 0 0 0
>>> 0 0 1 0 0 1 0 0 568 0
>>> 0 | 571 r = talk.politics.guns
>>> 11 0 0 0 0 0 0 0 0
>>> 1 0 0 0 1 3 8 1 4 338
>>> 2 | 369 s = talk.religion.misc
>>> 0 0 0 0 0 0 0 0 0
>>> 0 1 0 0 0 1 0 3 4 0
>>> 447 | 456 t = talk.politics.misc
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa 0.9806
>>> Accuracy 98.9406%
>>> Reliability 94.0932%
>>> Reliability (standard deviation) 0.2163
>>>
>>> Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 15870 ms (Minutes: 0.2645)
>>> + echo 'Testing on holdout set'
>>> Testing on holdout set
>>> + ./bin/mahout testnb -i /tmp/mahout-work-ec2-user/20news-test-vectors
>>> -m /tmp/mahout-work-ec2-user/model -l /tmp/mahout-work-ec2-user/labelindex
>>> -ow -o /tmp/mahout-work-ec2-user/20news-testing -c
>>>
>>> [snip]
>>>
>>> INFO: Complementary Results:
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances : 6715 89.3071%
>>> Incorrectly Classified Instances : 804 10.6929%
>>> Total Classified Instances : 7519
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a b c d e f g h i
>>> j k l m n o p q r s
>>> t <--Classified as
>>> 298 0 0 0 0 0 0 0 0
>>> 1 0 0 0 1 2 5 1 0 13
>>> 0 | 321 a = alt.atheism
>>> 0 298 11 6 1 12 2 2 1
>>> 1 3 8 3 4 2 4 1 4 4
>>> 1 | 368 b = comp.graphics
>>> 1 17 286 16 4 9 6 3 2
>>> 0 1 0 1 7 1 0 2 1 0
>>> 1 | 358 c = comp.os.ms-windows.misc
>>> 2 6 11 309 9 5 14 8 1
>>> 0 2 0 6 4 2 0 1 2 1
>>> 0 | 383 d = comp.sys.ibm.pc.hardware
>>> 0 10 8 7 334 7 5 5 2
>>> 0 3 0 2 1 1 0 1 1 0
>>> 0 | 387 e = comp.sys.mac.hardware
>>> 1 13 7 8 2 355 2 0 2
>>> 0 0 5 1 1 3 0 0 1 0
>>> 0 | 401 f = comp.windows.x
>>> 0 7 11 29 12 9 268 16 8
>>> 4 3 2 6 4 2 1 3 1 2
>>> 3 | 391 g = misc.forsale
>>> 0 1 0 0 3 0 7 362 8
>>> 2 2 1 2 0 2 0 1 2 0
>>> 4 | 397 h = rec.autos
>>> 0 0 0 1 0 0 1 0 423
>>> 0 0 0 2 1 0 1 0 0 0
>>> 0 | 429 i = rec.motorcycles
>>> 0 0 1 0 0 0 0 2 2
>>> 371 8 0 2 3 0 2 0 0 0
>>> 0 | 391 j = rec.sport.baseball
>>> 0 0 1 0 0 0 1 0 0
>>> 2 409 0 0 0 0 0 0 0 0
>>> 1 | 414 k = rec.sport.hockey
>>> 0 0 1 2 1 0 1 0 0
>>> 0 0 404 0 0 0 0 0 1 0
>>> 1 | 411 l = sci.crypt
>>> 0 5 4 11 1 3 7 9 2
>>> 5 3 3 339 2 6 0 1 1 2
>>> 1 | 405 m = sci.electronics
>>> 0 4 0 1 0 0 0 1 0
>>> 1 1 0 3 367 3 1 2 0 0
>>> 0 | 384 n = sci.med
>>> 0 1 2 0 1 0 2 0 0
>>> 1 0 0 1 1 375 0 1 0 0
>>> 0 | 385 o = sci.space
>>> 4 2 1 1 0 0 1 1 2
>>> 0 0 1 1 5 1 367 4 0 1
>>> 1 | 393 p = soc.religion.christian
>>> 0 1 0 0 0 0 0 0 0
>>> 2 0 0 0 0 0 2 378 0 1
>>> 0 | 384 q = talk.politics.mideast
>>> 0 0 0 0 0 2 1 1 1
>>> 1 0 3 0 3 0 0 2 319 2
>>> 4 | 339 r = talk.politics.guns
>>> 32 0 0 1 0 0 0 0 0
>>> 1 1 1 0 2 2 26 5 7 175
>>> 6 | 259 s = talk.religion.misc
>>> 0 0 0 2 0 0 0 0 0
>>> 1 2 2 0 1 2 1 10 18 2
>>> 278 | 319 t = talk.politics.misc
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa 0.8594
>>> Accuracy 89.3071%
>>> Reliability 84.611%
>>> Reliability (standard deviation) 0.2148
>>>
>>> Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 10840 ms (Minutes: 0.18066666666666667)
>>>
>>>
>>> *Naive bayes:*
>>> INFO: Standard NB Results:
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances : 11286 99.0869%
>>> Incorrectly Classified Instances : 104 0.9131%
>>> Total Classified Instances : 11390
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a b c d e f g h i
>>> j k l m n o p q r s
>>> t <--Classified as
>>> 474 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 2
>>> 1 | 477 a = alt.atheism
>>> 0 566 0 2 0 1 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0
>>> 0 | 569 b = comp.graphics
>>> 0 10 590 29 2 4 1 0 0
>>> 0 0 0 1 0 0 0 0 0 0
>>> 1 | 638 c = comp.os.ms-windows.misc
>>> 0 0 0 596 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0
>>> 0 | 596 d = comp.sys.ibm.pc.hardware
>>> 0 0 0 0 575 0 1 0 0
>>> 0 0 0 1 0 0 0 0 0 0
>>> 0 | 577 e = comp.sys.mac.hardware
>>> 0 2 2 2 0 593 1 0 0
>>> 0 0 0 0 0 1 0 0 0 0
>>> 0 | 601 f = comp.windows.x
>>> 0 0 0 1 0 0 589 1 0
>>> 0 1 0 2 0 0 0 0 0 0
>>> 0 | 594 g = misc.forsale
>>> 0 0 0 0 0 0 0 594 0
>>> 0 0 0 0 0 0 0 0 0 0
>>> 0 | 594 h = rec.autos
>>> 0 0 0 0 0 0 0 0 611
>>> 0 0 0 0 0 0 0 0 0 0
>>> 0 | 611 i = rec.motorcycles
>>> 0 0 0 0 0 0 0 0 0
>>> 616 1 0 0 0 0 0 0 0 0
>>> 0 | 617 j = rec.sport.baseball
>>> 0 0 0 0 0 0 1 0 0
>>> 0 620 0 0 0 0 0 0 0 0
>>> 0 | 621 k = rec.sport.hockey
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 580 0 0 0 0 0 1 0
>>> 0 | 581 l = sci.crypt
>>> 0 0 0 3 1 0 0 0 0
>>> 0 0 0 571 0 0 0 0 0 0
>>> 0 | 575 m = sci.electronics
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 0 2 583 0 0 0 0 0
>>> 0 | 585 n = sci.med
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 1 599 0 0 0 0
>>> 0 | 600 o = sci.space
>>> 0 1 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 615 0 0 0
>>> 0 | 616 p = soc.religion.christian
>>> 1 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 1 560 0 0
>>> 0 | 562 q = talk.politics.mideast
>>> 0 0 1 0 0 0 0 0 0
>>> 0 0 1 0 0 0 0 0 548 0
>>> 1 | 551 r = talk.politics.guns
>>> 10 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 1 1 0 2 344
>>> 1 | 359 s = talk.religion.misc
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 1 1 0 0 0 0 2 0
>>> 462 | 466 t = talk.politics.misc
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa 0.9847
>>> Accuracy 99.0869%
>>> Reliability 94.3334%
>>> Reliability (standard deviation) 0.2169
>>>
>>> Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 14304 ms (Minutes: 0.2384)
>>> + echo 'Testing on holdout set'
>>> Testing on holdout set
>>>
>>> [snip]
>>>
>>> INFO: Standard NB Results:
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances : 6718 90.1019%
>>> Incorrectly Classified Instances : 738 9.8981%
>>> Total Classified Instances : 7456
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a b c d e f g h i
>>> j k l m n o p q r s
>>> t <--Classified as
>>> 294 0 0 0 0 0 0 0 0
>>> 0 0 2 0 1 1 6 1 1 16
>>> 0 | 322 a = alt.atheism
>>> 0 345 6 14 6 11 6 0 0
>>> 0 0 5 7 1 3 0 0 0 0
>>> 0 | 404 b = comp.graphics
>>> 2 29 177 78 22 19 9 1 0
>>> 0 0 4 2 0 1 1 0 0 1
>>> 1 | 347 c = comp.os.ms-windows.misc
>>> 1 9 2 335 18 2 10 0 0
>>> 0 1 0 8 0 0 0 0 0 0
>>> 0 | 386 d = comp.sys.ibm.pc.hardware
>>> 1 4 2 13 347 3 5 1 0
>>> 0 1 0 7 1 0 0 0 1 0
>>> 0 | 386 e = comp.sys.mac.hardware
>>> 0 20 0 4 0 352 4 0 0
>>> 0 0 0 1 1 3 0 1 0 1
>>> 0 | 387 f = comp.windows.x
>>> 0 2 0 21 5 1 323 7 2
>>> 2 0 2 12 0 3 0 0 0 0
>>> 1 | 381 g = misc.forsale
>>> 0 1 0 0 1 0 15 363 8
>>> 1 0 0 4 1 0 0 0 1 0
>>> 1 | 396 h = rec.autos
>>> 0 1 0 0 0 0 6 6 370
>>> 0 0 0 0 1 0 0 0 0 1
>>> 0 | 385 i = rec.motorcycles
>>> 1 0 0 1 1 0 2 1 2
>>> 362 5 0 2 0 0 0 0 0 0
>>> 0 | 377 j = rec.sport.baseball
>>> 0 0 0 1 2 0 0 0 0
>>> 3 371 0 0 0 0 0 0 0 0
>>> 1 | 378 k = rec.sport.hockey
>>> 0 3 1 0 1 0 2 0 0
>>> 0 0 396 0 1 0 0 1 1 1
>>> 3 | 410 l = sci.crypt
>>> 0 7 0 7 7 2 6 4 0
>>> 0 0 1 369 2 2 0 0 0 0
>>> 2 | 409 m = sci.electronics
>>> 0 3 0 2 1 0 2 0 0
>>> 0 0 1 4 383 4 0 0 1 0
>>> 4 | 405 n = sci.med
>>> 0 5 0 0 1 0 3 0 0
>>> 0 0 0 1 0 374 1 0 0 1
>>> 1 | 387 o = sci.space
>>> 6 2 0 1 1 0 0 1 0
>>> 1 0 0 1 5 0 352 2 1 7
>>> 1 | 381 p = soc.religion.christian
>>> 1 1 0 0 0 0 0 0 0
>>> 0 1 0 0 0 0 0 373 1 0
>>> 1 | 378 q = talk.politics.mideast
>>> 0 0 0 0 0 0 1 0 1
>>> 0 0 2 0 0 0 0 0 346 2
>>> 7 | 359 r = talk.politics.guns
>>> 26 1 0 1 0 0 0 2 0
>>> 1 1 0 0 1 1 20 2 6 200
>>> 7 | 269 s = talk.religion.misc
>>> 1 0 0 0 0 0 0 2 0
>>> 0 1 0 0 2 2 0 1 14 0
>>> 286 | 309 t = talk.politics.misc
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa 0.8726
>>> Accuracy 90.1019%
>>> Reliability 85.4491%
>>> Reliability (standard deviation) 0.2222
>>>
>>> Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 10878 ms (Minutes: 0.1813)
>>>
>>> *SGD:*
>>> 7532 test files
>>>
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances : 5649 75%
>>> Incorrectly Classified Instances : 1883 25%
>>> Total Classified Instances : 7532
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a b c d e f g h i
>>> j k l m n o p q r s
>>> t <--Classified as
>>> 186 6 3 10 5 0 33 4 13
>>> 15 7 1 24 15 3 15 5 5 29
>>> 15 | 394 a = sci.space
>>> 5 309 0 3 2 5 0 0 0
>>> 1 9 21 2 0 0 18 4 4 1
>>> 1 | 385 b = comp.sys.mac.hardware
>>> 4 1 101 3 0 1 63 0 7
>>> 0 1 1 5 16 3 0 3 7 1
>>> 34 | 251 c = talk.religion.misc
>>> 11 12 1 265 1 10 3 0 0
>>> 17 10 11 5 2 0 11 3 6 21
>>> 0 | 389 d = comp.graphics
>>> 2 1 1 0 349 2 3 0 3
>>> 2 6 1 5 1 0 2 15 2 1
>>> 2 | 398 e = rec.motorcycles
>>> 7 20 3 19 2 254 6 0 2
>>> 11 2 39 7 2 0 4 2 2 9
>>> 3 | 394 f = comp.os.ms-windows.misc
>>> 2 1 13 0 0 0 247 0 1
>>> 1 3 0 6 2 4 0 2 3 5
>>> 29 | 319 g = alt.atheism
>>> 1 1 0 0 2 0 2 361 0
>>> 1 2 0 2 0 0 1 3 22 0
>>> 1 | 399 h = rec.sport.hockey
>>> 3 0 3 1 0 0 5 0 161
>>> 0 1 2 12 102 0 0 1 2 11
>>> 6 | 310 i = talk.politics.misc
>>> 2 8 0 19 0 19 0 0 1
>>> 294 10 11 4 2 0 5 0 3 11
>>> 6 | 395 j = comp.windows.x
>>> 2 10 0 1 1 0 0 0 0
>>> 1 347 13 2 1 0 5 3 2 2
>>> 0 | 390 k = misc.forsale
>>> 1 36 0 6 1 25 0 0 1
>>> 6 10 257 2 1 0 34 6 0 6
>>> 0 | 392 l = comp.sys.ibm.pc.hardware
>>> 2 2 2 2 1 0 12 0 0
>>> 6 10 4 312 5 2 13 11 3 3
>>> 6 | 396 m = sci.med
>>> 2 0 3 2 1 0 0 1 13
>>> 0 5 1 2 314 2 0 2 2 10
>>> 4 | 364 n = talk.politics.guns
>>> 1 0 2 1 1 0 34 1 33
>>> 1 3 0 1 8 271 1 4 5 6
>>> 3 | 376 o = talk.politics.mideast
>>> 3 14 0 8 2 8 3 1 1
>>> 7 12 29 6 2 1 245 13 2 32
>>> 4 | 393 p = sci.electronics
>>> 3 3 0 2 11 0 1 0 2
>>> 1 11 6 4 2 0 11 330 4 4
>>> 1 | 396 q = rec.autos
>>> 0 0 1 0 1 0 4 12 3
>>> 1 3 0 0 0 0 5 6 359 1
>>> 1 | 397 r = rec.sport.baseball
>>> 0 1 0 0 0 1 0 0 3
>>> 3 0 0 3 2 1 6 1 6 366
>>> 3 | 396 s = sci.crypt
>>> 0 2 11 1 1 0 40 0 1
>>> 2 3 4 2 1 0 5 0 2 2
>>> 321 | 398 t = soc.religion.christian
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa 0.7073
>>> Accuracy 75%
>>> Reliability 70.6238%
>>> Reliability (standard deviation) 0.2187
>>> Log-likelihood mean : -1.1182
>>> 25%-ile : -1.6911
>>> 75%-ile : -0.0803
>>>
>>> Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 10783 ms (Minutes: 0.17971666666666666)
>>>
>>>
>>>
>>>
>>> On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <su...@yahoo.com>wrote:
>>>
>>>> Thanks Andrew for reporting that. I rolled back the release to fix this
>>>> and few other issues.
>>>>
>>>> We have removed asf-examples*.sh from trunk as the sample file at the
>>>> url mentioned in ur email is not available.
>>>> This is something we need to fix and restore in 1.0.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <
>>>> ap.dev@outlook.com> wrote:
>>>>
>>>> from the asf-email-examples.sh script:
>>>>
>>>> # You will need to download or otherwise obtain some or all of the
>>>> Amazon ASF Em
>>>> ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566)
>>>> to use this
>>>> script.
>>>> # To obtain a full copy you will need to launch an EC2 instance and
>>>> mount the da
>>>> taset to download it, otherwise you can get a sample of it at
>>>> #
>>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>>>>
>>>> It looks like the:
>>>>
>>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>>>>
>>>> link is down.
>>>>
>>>> Is there somewhere else that we can get a subset of the ASF emails?
>>>>
>>>>
>>>>
>>>> Date: Tue, 21 Jan 2014 09:48:06 -0800
>>>> > Subject: Re: MAHOUT 0.9 Release - New URL
>>>> > From: andrew.musselman@gmail.com
>>>> > To: dev@mahout.apache.org
>>>> >
>>>> > Sure thing; continuing to smoke test the other examples tonight
>>>> >
>>>> >
>>>> > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <
>>>> suneel_marthi@yahoo.com>wrote:
>>>> >
>>>> > > Thanks Andrew M., see that some of the example scripts need to be
>>>> fixed as
>>>> > > they still refer to the deprecated algorithms.
>>>> > > See that the Streaming KMeans has failed for you as well.
>>>> > >
>>>> > > I'll be rolling back the release today to fix these issues.
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
>>>> > > andrew.musselman@gmail.com> wrote:
>>>> > >
>>>> > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default
>>>> 64-bit
>>>> > > Linux AMI from tarball.
>>>> > >
>>>> > > All tests pass.
>>>> > >
>>>> > > *Output of examples:*
>>>> > > *asf-email-examples.sh, run on mahout.apache.org
>>>> > > <http://mahout.apache.org>:*
>>>> > > *recommendations:*
>>>> > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
>>>> > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000 | less
>>>> > > 1
>>>> > >
>>>> > >
>>>> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
>>>> > > 4
>>>> > >
>>>> > >
>>>> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
>>>> > > 6
>>>> > >
>>>> > >
>>>> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
>>>> > > 8
>>>> > > [12758:1.0,19409:1.0,11112:1.0]
>>>> > > 11
>>>> > >
>>>> > >
>>>> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
>>>> > > 14
>>>> > >
>>>> > >
>>>> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
>>>> > > 15
>>>> > >
>>>> > >
>>>> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
>>>> > > 16
>>>> > >
>>>> > >
>>>> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
>>>> > > 18
>>>> > >
>>>> > >
>>>> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
>>>> > > 19 [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
>>>> > > 20
>>>> > >
>>>> > >
>>>> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
>>>> > > [snip]
>>>> > >
>>>> > > *clustering; kmeans:*
>>>> > > [snip]
>>>> > > Weight : [props - optional]: Point:
>>>> > > 1.0 :
>>>> > > [distance-squared=1.0193102046188427]:
>>>> > > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus=
>>>> > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
>>>> 7573:0.204,
>>>> > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093,
>>>> 9779:0.159,
>>>> > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
>>>> > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
>>>> > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
>>>> > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
>>>> > > 39789:0.110, 40743:0.190, 45775:0.086]
>>>> > > 1.0 : [distance-squared=0.9823018320457279]:
>>>> > > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus=
>>>> > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
>>>> 5336:0.106,
>>>> > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173,
>>>> 7832:0.072,
>>>> > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
>>>> > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
>>>> > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
>>>> > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
>>>> > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
>>>> > > 1.0 : [distance-squared=0.9509142993214911]:
>>>> > >
>>>> /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
>>>> > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
>>>> > > 4419:0.076,
>>>> > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056,
>>>> 7235:0.048,
>>>> > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123,
>>>> 7683:0.077,
>>>> > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
>>>> 10225:0.081,
>>>> > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
>>>> > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
>>>> > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
>>>> > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
>>>> > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
>>>> > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
>>>> > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
>>>> > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
>>>> > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
>>>> > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
>>>> > > 41280:0.065, 41696:0.072, 41947:0.118,
>>>> > > 43685:0.086, 44077:0.308,
>>>> > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
>>>> > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
>>>> > > [snip]
>>>> > >
>>>> > > *clustering; dirichlet:*
>>>> > > Get this complaint:
>>>> > > Running Dirichlet with K = 8
>>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
>>>> > > HADOOP_CONF_DIR=
>>>> > > MAHOUT-JOB:
>>>> > >
>>>> > >
>>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class:
>>>> dirichlet
>>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props
>>>> found on
>>>> > > classpath, will use command-line arguments only
>>>> > > Unknown program 'dirichlet' chosen.
>>>> > >
>>>> > > *clustering: minhash:*
>>>> > > Running Minhash
>>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
>>>> > > HADOOP_CONF_DIR=
>>>> > > MAHOUT-JOB:
>>>> > >
>>>> > >
>>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>>>> > > 14/01/21 05:17:27 WARN
>>>> > > driver.MahoutDriver: Unable to add class: minhash
>>>> > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found
>>>> on
>>>> > > classpath, will use command-line arguments only
>>>> > > Unknown program 'minhash' chosen.
>>>> > >
>>>> > > *classification; standard:*
>>>> > > =======================================================
>>>> > > Summary
>>>> > > -------------------------------------------------------
>>>> > > Correctly Classified Instances : 5384 87.7874%
>>>> > > Incorrectly Classified Instances : 749 12.2126%
>>>> > > Total Classified Instances : 6133
>>>> > >
>>>> > > =======================================================
>>>> > > Confusion Matrix
>>>> > > -------------------------------------------------------
>>>> > > a b c d
>>>> > > <--Classified as
>>>> > > 2949 7 531 25 | 3512 a = dev
>>>> > > 0 0 0 0 | 0 b = general
>>>> > > 99 8 1763 8 | 1878 c = user
>>>> > > 41 1 29 672 | 743 d = commits
>>>> > >
>>>> > > =======================================================
>>>> > > Statistics
>>>> > > -------------------------------------------------------
>>>> > > Kappa
>>>> > > 0.7877
>>>> > > Accuracy 87.7874%
>>>> > > Reliability 53.658%
>>>> > > Reliability (standard deviation) 0.4911
>>>> > >
>>>> > > *classification; complementary:*
>>>> > > =======================================================
>>>> > > Summary
>>>> > > -------------------------------------------------------
>>>> > > Correctly Classified Instances : 5530 90.1679%
>>>> > > Incorrectly Classified Instances : 603 9.8321%
>>>> > > Total Classified Instances :
>>>> > > 6133
>>>> > >
>>>> > > =======================================================
>>>> > > Confusion Matrix
>>>> > > -------------------------------------------------------
>>>> > > a b c d <--Classified as
>>>> > > 3168 0 276 68 | 3512 a = dev
>>>> > > 0 0 0 0 | 0 b = general
>>>> > > 196 0 1652 30 | 1878 c = user
>>>> > > 25 0 8 710 | 743 d =
>>>> > > commits
>>>> > >
>>>> > > =======================================================
>>>> > > Statistics
>>>> > > -------------------------------------------------------
>>>> > > Kappa 0.8259
>>>> > > Accuracy 90.1679%
>>>> > > Reliability 54.7459%
>>>> > > Reliability (standard deviation) 0.5005
>>>> > >
>>>> > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms
>>>> (Minutes:
>>>> > > 0.34836666666666666)
>>>> > >
>>>> > > *classification; sgd, with three categories:*
>>>> > > Running SGD Training
>>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
>>>> > > and
>>>> > > HADOOP_CONF_DIR=
>>>> > > MAHOUT-JOB:
>>>> > >
>>>> > >
>>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>>>> > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
>>>> > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
>>>> classpath,
>>>> > > will use command-line arguments only
>>>> > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
>>>> > > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
>>>> > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
>>>> > > --output=[asf-output/classification/sgd/models], --poolSize=[5],
>>>> > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
>>>> > > 24168 training files
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 1
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
>>>> > > 2
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 3
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 4
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 6
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 8
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 10
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00
>>>> > > 0.00 0.00 0.0000000 0.0000000 12
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 15
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 20
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 25
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 30
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000
>>>> > > 0.0000000 40
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 50
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 60
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 70
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 80
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 100
>>>> > > 0.000
>>>> > > 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 120
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 140
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 150
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 200
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 250
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00
>>>> > > 0.00 0.00 0.0000000 0.0000000 300
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 400
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 500
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 600
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 700
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000
>>>> > > 0.0000000 800
>>>> > > 0.000 0.00 none
>>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
>>>> > > 1.0019413e-08 1000 -0.607 75.78 none
>>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
>>>> > > 1.0019413e-08 1200 -0.607 75.78 none
>>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
>>>> > > 1.0019413e-08 1400 -0.607 75.78 none
>>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
>>>> > > 1.0019413e-08 1500 -0.607 75.78 none
>>>> > > 0.24 43686.00 17924.00 329.50
>>>> > > 1.0571799e-08
>>>> > > 1.0032261e-08 2000 -0.487 82.65 none
>>>> > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
>>>> > > 1.0011902e-08 2500 -0.439 83.90 none
>>>> > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
>>>> > > 1.0011902e-08 3000 -0.439 83.90 none
>>>> > > 0.32 50635.00 28531.00 437.09 1.0551175e-08
>>>> > > 1.0000001e-08 4000 -0.351 88.14 none
>>>> > > 0.32 50635.00 32642.00 437.09 1.0551175e-08
>>>> > > 1.0000000e-08 5000 -0.378 87.10 none
>>>> > > 0.32 50635.00 36461.00 437.09
>>>> > > 1.0556652e-08
>>>> > > 1.0000001e-08 6000 -0.372 86.89 none
>>>> > > 0.32 50635.00 37768.00 437.09 1.0576742e-08
>>>> > > 1.0000001e-08 7000 -0.334 89.26 none
>>>> > > 0.32 50635.00 38807.00 437.09 1.0576742e-08
>>>> > > 1.0000000e-08 8000 -0.368 87.52 none
>>>> > > 0.32 50635.00 44731.00 437.09 1.0576716e-08
>>>> > > 1.0000000e-08 10000 -0.374 87.39 none
>>>> > > 0.32 50635.00 45672.00 437.09 1.0576716e-08
>>>> > > 1.0000000e-08 12000 -0.298 88.26 none
>>>> > > Exception in thread "main" java.lang.IllegalStateException:
>>>> > > java.lang.ArrayIndexOutOfBoundsException:
>>>> > > 2
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
>>>> > > at
>>>> > >
>>>> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
>>>> > > at
>>>> > >
>>>> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
>>>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>> Method)
>>>> > > at
>>>> > >
>>>> > >
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>> > >
>>>> > > at
>>>> > >
>>>> > >
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> > > at java.lang.reflect.Method.invoke(Method.java:622)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>>> > > at
>>>> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>> > > at
>>>> > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>>>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>> Method)
>>>> > > at
>>>> > >
>>>> > >
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>> > > at
>>>> > >
>>>> > >
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> > > at java.lang.reflect.Method.invoke(Method.java:622)
>>>> > > at
>>>> > > org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>>>> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
>>>> > > at
>>>> > > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
>>>> > >
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
>>>> > > at
>>>> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>> > > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>> > > at
>>>> > >
>>>> > >
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>>>> > > at
>>>> > >
>>>> > >
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> > > at java.lang.Thread.run(Thread.java:701)
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
>>>> > > andrew.musselman@gmail.com> wrote:
>>>> > >
>>>> > > > Trying out the build today
>>>> > > >
>>>> > > >
>>>> > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
>>>> suneel_marthi@yahoo.com
>>>> > > >wrote:
>>>> > > >
>>>> > > >> This is an issue (trivial one though) that needs to be fixed for
>>>> 0.9
>>>> > > >> Release, will be rerolling the release today (in the next few
>>>> hrs) and
>>>> > > >> putting out a new release candidate in staging.
>>>> > > >>
>>>> > > >> Thanks for reporting this Andrew P.
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
>>>> > > ap.dev@outlook.com>
>>>> > > >> wrote:
>>>> > > >>
>>>> > > >> I ran through the tests with on a CentOS VM
>>>> > > AMD64 2 cores 4 GB RAM. Had
>>>> > > >> a bit of trouble getting the Hadoop natives to compile and
>>>> therefore may
>>>> > > >> have run into some problems because of the hadoop setup. Ran
>>>> into some
>>>> > > >> problems in the example scripts. Particularly with
>>>> > > >> ./cluster-syntheticcontrol.sh ->4,5. I will run through the
>>>> rest of the
>>>> > > >> examples when im sure I've got hadoop setup right.
>>>> > > >>
>>>> > > >>
>>>> > > >> Apache Maven 3.1.2-SNAPSHOT
>>>> > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
>>>> > > >> Java home: /usr/java/jdk1.6.0_45/jre
>>>> > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch:
>>>> "amd64",
>>>> > > >> family: "unix"
>>>> > > >> $MAHOUT_LOCAL=true
>>>> > > >> Hadoop 2.2.0
>>>> > > >>
>>>> > > >>
>>>> > > >> a) Verify that u can unpack the release (tar or zip) ...passed
>>>> (tar)
>>>> > > >> [passed ]
>>>> > > >>
>>>> > > >> b) Verify u r able to compile the
>>>> > > distro
>>>> > > >>
>>>> > > >> mvn compile- [passed with warnings]
>>>> > > >>
>>>> > > >> [WARNING] Expected all dependencies to require Scala
>>>> version: 2.9.3
>>>> > > >> [WARNING] org.apache.mahout:mahout-math-scala:0.9 requires
>>>> scala
>>>> > > >> version: 2.9.3
>>>> > > >> [WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala
>>>> > > >> version: 2.9.2
>>>> > > >> [WARNING] Multiple versions of scala libraries detected!
>>>> > > >>
>>>> > > >> c) Run through the unit tests: mvn clean test
>>>> > > >> mvn clean test [passed]
>>>> > > >>
>>>> > > >> d) Run the
>>>> > > >> example scripts under $MAHOUT_HOME/examples/bin.
>>>> > > >> Please run through all the different options in each script
>>>> > > >>
>>>> > > >> Running example scripts with $MAHOUT_LOCAL=true
>>>> > > >>
>>>> > > >>
>>>> > > ./cluster-syntheticcontrol.sh ->1 [works]
>>>> > > >> ./cluster-syntheticcontrol.sh ->2 [works]
>>>> > > >> ./cluster-syntheticcontrol.sh ->3 [works]
>>>> > > >>
>>>> > > >>
>>>> > > >> ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
>>>> > > >> [...]
>>>> > > >> WARNING: Unable to add class:
>>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>>>> > > >> java.lang.ClassNotFoundException:
>>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>>>> > > >> at
>>>> > > >> java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>> > > >> at java.security.AccessController.doPrivileged(Native
>>>> Method)
>>>> > > >> at
>>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>> > > >> at
>>>> > > java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>>> > > >> at
>>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>>> > > >> at java.lang.Class.forName0(Native Method)
>>>> > > >> at java.lang.Class.forName(Class.java:171)
>>>> > > >> at
>>>> > > >>
>>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>>>> > > >> at
>>>> > > >>
>>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>>>> > > >> Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
>>>> > > >>
>>>> > > >>
>>>> > > >> ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
>>>> > > >>
>>>> > > >> WARNING: Unable to add class:
>>>> > > >>
>>>> > > org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>>>> > > >> java.lang.ClassNotFoundException:
>>>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>>>> > > >> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>> > > >> at java.security.AccessController.doPrivileged(Native
>>>> Method)
>>>> > > >> at
>>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>>> > > >> at
>>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>>> > > >> at java.lang.Class.forName0(Native Method)
>>>> > > >> at
>>>> > > java.lang.Class.forName(Class.java:171)
>>>> > > >> at
>>>> > > >>
>>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>>>> > > >> at
>>>> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>>>> > > >> Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
>>>> > > >> WARNING: No
>>>> > > >>
>>>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found
>>>> > > on
>>>> > > >> classpath, will use command-line arguments only
>>>> > > >> Unknown program
>>>> > > >> 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
>>>> chosen.
>>>> > > >>
>>>> > > >>
>>>> > > >> ./classify-20newsgroups.sh ->1 [works]
>>>> > > >> ./classify-20newsgroups.sh ->2 [works]
>>>> > > >>
>>>> > > >>
>>>> > > >> cluster-reuters.sh ->1 [works]
>>>> > > >>
>>>> > > cluster-reuters.sh ->2 [works]
>>>> > > >> cluster-reuters.sh ->3 [works]
>>>> > > >>
>>>> > > >> Same error as noted previosly in the thread:
>>>> > > >>
>>>> > > >> cluster-reuters.sh ->4 [0 clusters]
>>>> > > >>
>>>> > > >> [...]
>>>> > > >>
>>>> > > >> WARNING: No qualcluster.props found on classpath, will use
>>>> > > >> command-line arguments only
>>>> > > >> Num clusters: 0; maxDistance: 0.000000
>>>> > > >> [Dunn Index]
>>>> > > >> First: Infinity
>>>> > > >> [Davies-Bouldin Index] First: NaN
>>>> > > >> Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
>>>> > > >> INFO: Program took 669 ms (Minutes: 0.01115)
>>>> > > >> cluster,distance.mean,distance.sd
>>>> > > >>
>>>> > >
>>>> > >
>>>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
>>>> > > >> > From: suneel_marthi@yahoo.com
>>>> > > >> > Subject: MAHOUT 0.9 Release - New URL
>>>> > > >> > To: user@mahout.apache.org; dev@mahout.apache.org
>>>> > > >> >
>>>> > > >> > Third time's a Charm!!!
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > Here's the new URL for Mahout 0.9 Release:
>>>> > > >> >
>>>> > > >>
>>>> > >
>>>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>>>> > > >> >
>>>> > > >> > For those volunteering to test this, some of the things to be
>>>> > > verified:
>>>> > > >> >
>>>> > > >> > a) Verify that u can unpack the release (tar or zip)
>>>> > > >> > b) Verify u r able to compile the distro
>>>> > > >> > c) Run through the unit tests: mvn clean test
>>>> > > >> > d) Run the example scripts
>>>> > > >> under $MAHOUT_HOME/examples/bin. Please run through all the
>>>> different
>>>> > > >> options in each script.
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > Committers
>>>> > > >> > and PMC members:
>>>> > > >> > ---------------------------------------
>>>> > > >> >
>>>> > > >> > Need 'at least 3 +1 votes' for the Release to pass.
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > Thanks and
>>>> > > Regards.
>>>> > > >>
>>>> > > >
>>>> > > >
>>>> > >
>>>>
>>>
>>>
>>
>
Re: MAHOUT 0.9 Release - New URL
Posted by Suneel Marthi <su...@yahoo.com>.
Andrew M., Andrew P. and others,
Sebastian and me fixed a few issues today (for 0.9):
a) Removed asf-email-examples.sh script and few other scripts that should have been removed. Also removed references/invocations to algorithms that have been removed from the codebase.
b) Fixed the issue with Streaming Kmeans clustering and checked in the code.
c) Resurrected Frequent Pattern Mining implementation for 0.9.
Please checkout the latest code from trunk, run a build locally and run thru the example scripts.
Thanks and Regards.
On Wednesday, January 22, 2014 12:11 AM, Andrew Musselman <an...@gmail.com> wrote:
*factorize-movielens-1M.sh:*
RMSE is:
0.8519064098265133
Sample recommendations:
2229
[2197:4.4961276,527:4.4309845,2972:4.4180074,318:4.379484,572:4.312696,3092:4.249903]
5848
[1900:4.6775646,3787:4.6623707,632:4.641377,2609:4.608225,3808:4.6058936,2998:4.6057487]
3728
[572:4.951382,47:4.748921,874:4.6945343,1563:4.679901,3314:4.6621537,50:4.655838]
1252
[53:5.0,3816:4.9664702,3077:4.9494777,213:4.94007,3808:4.9060082,978:4.8568053]
634
[572:5.0,3092:4.779557,1872:4.72024,2687:4.629712,2125:4.615142,3853:4.5261393]
5516 [572:5.0,2197:5.0,3092:5.0,318:4.908213,356:4.885,3844:4.8237453]
2276 [1204:5.0,572:5.0,912:5.0,1250:5.0,1272:4.999891,1262:4.989652]
4219
[53:4.8112006,598:4.775032,858:4.761604,572:4.7579737,1219:4.680987,1221:4.6604886]
91 [1198:5.0,2762:5.0,1207:5.0,1234:5.0,318:5.0,260:5.0]
502
[953:5.0,260:4.9800477,1234:4.869403,1198:4.8527064,1207:4.8497486,3469:4.847286]
factorize-netflix.sh:
References a no-longer-available data set that Netflix took down after the
competition; should at least mention that the data set is no longer
"online" at least.
On Tue, Jan 21, 2014 at 8:05 PM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:
> *clustering-syntheticcontrol.sh*
>
> *Canopy:*
> [snip]
> 1.0 : [distance-squared=1740.681000315628]: [35.486, 25.600,
> 29.914, 30.200, 27.654, 28.647, 26.582, 32.941, 34.728, 29.047, 34.706,
> 34.764, 28.816, 30.271, 25.784, 25.035, 35.436, 29.075, 24.267, 24.625,
> 28.162, 26.218, 28.773, 28.966, 28.802, 34.292, 26.211, 33.363, 32.920,
> 31.891, 34.504, 32.686, 24.327, 35.981, 31.390, 10.832, 20.238, 10.051,
> 14.877, 10.570, 19.603, 14.544, 10.667, 16.470, 19.007, 10.352, 13.473,
> 12.196, 10.684, 16.620, 20.434, 17.069, 18.744, 9.599, 11.195, 12.002,
> 10.017, 17.149, 14.850, 10.890]
> 1.0 : [distance-squared=1455.363773097357]: [31.022, 28.140,
> 26.730, 26.570, 29.561, 26.966, 28.049, 25.673, 33.721, 26.275, 30.410,
> 31.101, 24.019, 35.659, 25.253, 25.932, 28.618, 32.423, 33.666, 33.745,
> 35.118, 29.164, 25.477, 31.947, 35.491, 30.730, 25.820, 24.651, 25.528,
> 31.343, 29.005, 31.825, 26.891, 28.194, 31.429, 16.935, 8.070, 16.604,
> 14.743, 10.342, 8.155, 10.395, 17.689, 16.791, 14.138, 15.761, 6.787,
> 13.062, 16.660, 15.021, 9.891, 9.216, 11.550, 8.877, 18.220, 9.477, 10.342,
> 16.430, 11.898, 15.366]
> 1.0 : [distance-squared=1679.9304895378882]: [29.625, 25.503,
> 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> 15.285, 22.528, 20.657, 24.129]
> 1.0 : [distance-squared=2044.2887801683828]: [27.414, 25.397,
> 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> 20.229, 11.131, 9.980, 10.720]
> 1.0 : [distance-squared=1385.3154063160764]: [35.899, 26.672,
> 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> 16.546, 15.927, 18.084, 17.475]
> 1.0 : [distance-squared=1920.6376615603585]: [24.538, 24.280,
> 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> 19.310, 12.999, 17.460]
> 1.0 : [distance-squared=2192.939571172661]: [34.335, 30.938,
> 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> 11.743, 11.699, 10.152]
> Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Wrote 6 clusters
> Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 5827 ms (Minutes: 0.09711666666666667)
>
> *K-means:*
> [snip]
> 1.0 : [distance-squared=2873.881301031739]: [26.369, 37.791,
> 41.839, 39.694, 36.728, 35.079, 30.668, 24.755, 20.610, 18.885, 15.459,
> 14.074, 19.117, 34.230, 32.202, 40.715, 39.543, 37.193, 39.448, 30.829,
> 25.273, 22.324, 19.801, 13.505, 17.462, 24.287, 27.398, 38.577, 42.108,
> 41.947, 43.987, 41.331, 28.958, 23.664, 20.308, 22.244, 12.149, 15.768,
> 18.964, 32.579, 33.202, 36.205, 42.364, 40.601, 35.415, 39.576, 33.145,
> 19.936, 23.062, 19.053, 24.383, 19.611, 25.218, 38.860, 36.570, 38.964,
> 36.946, 36.900, 32.593, 31.563]
> 1.0 : [distance-squared=2525.8924241648783]: [35.389, 31.178,
> 40.041, 43.034, 49.524, 40.942, 42.369, 30.153, 26.253, 23.178, 19.885,
> 19.693, 21.837, 26.858, 23.533, 29.798, 43.401, 43.349, 48.238, 43.868,
> 44.017, 28.056, 25.284, 22.737, 13.703, 14.023, 22.115, 23.720, 25.478,
> 37.870, 37.868, 46.954, 43.135, 41.286, 37.269, 29.347, 24.312, 21.743,
> 19.152, 21.668, 10.882, 16.147, 30.020, 28.472, 38.716, 44.620, 47.847,
> 48.846, 39.361, 38.449, 33.105, 19.935, 14.961, 12.149, 12.630, 13.459,
> 18.967, 35.473, 30.146, 45.527]
> 1.0 : [distance-squared=2392.7171990886272]: [27.662, 37.199,
> 39.158, 44.264, 46.473, 40.178, 38.728, 24.412, 25.155, 15.938, 13.125,
> 16.867, 13.875, 29.969, 34.322, 40.870, 44.225, 47.594, 47.607, 44.751,
> 39.813, 32.461, 16.646, 16.305, 21.256, 20.627, 19.267, 31.901, 34.995,
> 43.643, 43.152, 47.125, 48.524, 33.131, 32.697, 20.300, 14.350, 18.330,
> 12.155, 11.261, 19.701, 29.313, 30.457, 39.914, 49.543, 49.851, 46.445,
> 39.561, 28.860, 22.486, 13.967, 10.006, 11.715, 15.945, 21.348, 25.439,
> 41.024, 37.105, 45.623, 43.589]
> 1.0 : [distance-squared=1419.8378244373016]: [25.784, 34.129,
> 42.659, 37.176, 35.961, 34.307, 32.108, 29.749, 25.047, 17.455, 24.652,
> 25.311, 22.995, 30.256, 25.955, 28.426, 34.556, 40.386, 39.642, 40.566,
> 32.612, 34.091, 26.033, 25.668, 26.545, 17.338, 24.980, 20.134, 27.542,
> 36.612, 31.855, 37.947, 39.736, 33.535, 36.607, 37.479, 32.612, 22.262,
> 20.662, 16.124, 24.547, 27.686, 21.747, 27.198, 31.259, 40.569, 37.067,
> 34.465, 34.730, 33.371, 23.060, 30.162, 22.022, 22.216, 14.812, 19.357,
> 24.508, 34.432, 32.155, 34.839]
> 1.0 : [distance-squared=4186.814512311335]: [25.870, 39.195,
> 36.908, 47.052, 47.384, 40.741, 42.494, 30.282, 25.834, 17.650, 16.004,
> 17.895, 13.321, 19.045, 27.440, 31.911, 39.208, 43.622, 41.567, 44.815,
> 44.921, 35.422, 35.477, 23.190, 17.859, 14.684, 23.504, 23.141, 21.746,
> 30.816, 31.361, 37.015, 38.094, 46.688, 47.681, 43.777, 39.652, 31.701,
> 23.767, 22.265, 22.654, 22.327, 19.195, 21.163, 29.602, 27.563, 36.244,
> 38.859, 44.234, 42.352, 42.160, 40.172, 30.094, 21.092, 25.193, 13.096,
> 18.111, 14.754, 27.386, 27.026]
> 1.0 : [distance-squared=1544.4011543572997]: [28.075, 41.784,
> 42.120, 38.735, 44.320, 34.316, 32.212, 31.868, 24.301, 14.547, 17.178,
> 22.279, 24.357, 31.011, 31.444, 34.837, 46.550, 48.301, 38.859, 42.363,
> 35.657, 31.499, 19.794, 12.124, 15.371, 12.436, 15.763, 24.679, 32.597,
> 43.004, 36.616, 38.935, 42.954, 34.957, 36.183, 28.177, 16.326, 14.988,
> 10.680, 22.728, 24.075, 24.058, 36.616, 43.982, 39.198, 40.118, 40.078,
> 34.752, 34.018, 23.750, 18.374, 12.251, 15.539, 18.699, 28.973, 24.044,
> 39.404, 38.034, 46.458, 44.432]
> 1.0 : [distance-squared=825.9338725427806]: [33.670, 38.675,
> 39.742, 41.989, 37.291, 43.975, 31.909, 25.878, 31.080, 15.858, 13.950,
> 23.097, 19.983, 21.692, 31.579, 38.570, 33.376, 38.843, 41.936, 33.534,
> 39.195, 32.897, 25.343, 18.523, 15.089, 17.771, 22.614, 25.313, 23.687,
> 29.010, 41.995, 35.712, 40.872, 41.669, 32.156, 25.162, 24.980, 23.705,
> 18.413, 20.975, 14.906, 26.171, 30.165, 27.818, 35.083, 39.514, 37.851,
> 33.967, 32.338, 34.977, 26.589, 28.079, 19.597, 24.669, 23.098, 25.685,
> 28.215, 34.940, 36.910, 39.749]
> Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Wrote 6 clusters
> Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 16902 ms (Minutes: 0.2817)
>
> *Fuzzy k-means:*
> [snip]
> 1.0 : [distance-squared=971.7369782121968]: [29.625, 25.503,
> 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> 15.285, 22.528, 20.657, 24.129]
> 1.0 : [distance-squared=2054.618163154475]: [27.414, 25.397,
> 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> 20.229, 11.131, 9.980, 10.720]
> 1.0 : [distance-squared=954.6503560728597]: [35.899, 26.672,
> 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> 16.546, 15.927, 18.084, 17.475]
> 1.0 : [distance-squared=2817.9170498632957]: [24.538, 24.280,
> 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> 19.310, 12.999, 17.460]
> 1.0 : [distance-squared=3472.3684696871424]: [34.335, 30.938,
> 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> 11.743, 11.699, 10.152]
> Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Wrote 6 clusters
> Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 106615 ms (Minutes: 1.7769166666666667)
>
> *Dirichlet and Meanshift:*
> Already detailed in M-1400, deprecated jobs still referenced.
>
>
>
> On Tue, Jan 21, 2014 at 6:20 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
>> *cluster-reuters.sh*
>> *k-means:*
>>
>> [snip]
>> :VL-19482{n=913 c=[0.06:0.011, 0.1:0.007, 0.13:0.010, 0.25:0.016,
>> 0.38:0.020, 0.4:0.007, 0.5:0.032, 0
>> Top Terms:
>> banks =>
>> 3.841823268955143
>> bank =>
>> 3.80633066361209
>> debt =>
>> 3.28065219870794
>> said =>
>> 2.5965700942088583
>> he =>
>> 2.335682813857497
>> foreign =>
>> 2.2217853688201403
>> billion =>
>> 2.1970193848291335
>> would =>
>> 1.9932392063955617
>> loans =>
>> 1.9309276792854233
>> interest =>
>> 1.787324501938
>> have =>
>> 1.762981951432578
>> its =>
>> 1.7615109954971866
>> which =>
>> 1.5822081148036862
>> has =>
>> 1.5600708189041956
>> dlrs =>
>> 1.5571038313005996
>> finance =>
>> 1.5539758811252924
>> new =>
>> 1.5176015811577555
>> had =>
>> 1.5138723701401844
>> brazil =>
>> 1.5083369853593172
>> payments =>
>> 1.4539044255886517
>> Weight : [props - optional]: Point:
>>
>> :VL-7320{n=2726 c=[0:0.003, 0.1:0.010, 0.2:0.007, 0.3:0.009, 0.4:0.007,
>> 0.40:0.003, 0.5:0.009, 0.57:0
>> Top Terms:
>> vs =>
>> 6.126130791333171
>> net =>
>> 4.012191567277523
>> cts =>
>> 3.822006848832744
>> shr =>
>> 3.6786004856764527
>> mln =>
>> 2.9011643584038698
>> loss =>
>> 2.788368861463607
>> qtr =>
>> 2.714140225051522
>> revs =>
>> 2.4739861236454717
>> profit =>
>> 1.8146888090247015
>> note =>
>> 1.7977163272138388
>> dlrs =>
>> 1.6164390808155846
>> avg =>
>> 1.3901765773336587
>> shrs =>
>> 1.3856326531419314
>> mths =>
>> 1.3168717272038506
>> 4th =>
>> 1.2161158425617289
>> oper =>
>> 1.182419473776814
>> year =>
>> 1.178086061733047
>> nine =>
>> 1.0670554836445316
>> 3rd =>
>> 1.041334410056592
>> inc =>
>> 1.0019361981554935
>> Weight : [props - optional]: Point:
>>
>>
>> Inter-Cluster Density: 0.45562152681859414
>> Intra-Cluster Density: 0.6952712632167628
>> CDbw Inter-Cluster Density: 0.0
>> CDbw Intra-Cluster Density: 16.486930227598684
>> CDbw Separation: 194.49005884464628
>>
>> *fuzzy k-means:*
>> :SV-18539{n=1039 c=[0:0.026, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
>> 0.01:0.005, 0.02:0.002, 0.0
>> Top Terms:
>> said =>
>> 1.8665592354713065
>> its =>
>> 1.1335212213411592
>> pct =>
>> 1.0862816801353348
>> dlrs =>
>> 1.0854998884993752
>> mln =>
>> 1.043163996400643
>> from =>
>> 0.9684961110525736
>> has =>
>> 0.912161511978058
>> company =>
>> 0.8754186972808333
>> mar =>
>> 0.8675333452422878
>> inc =>
>> 0.7678617590362815
>> would =>
>> 0.7610968883652675
>> he =>
>> 0.7459988770503974
>> which =>
>> 0.7435613119406804
>> year =>
>> 0.7302840632748394
>> u.s =>
>> 0.7281061062439116
>> shares =>
>> 0.7260764102983083
>> corp =>
>> 0.7179807367808658
>> new =>
>> 0.7044203783157115
>> stock =>
>> 0.6962010978721442
>> have =>
>> 0.6464265467298506
>> :SV-9431{n=1034 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
>> 0.01:0.004, 0.02:0.002, 0.02
>> Top Terms:
>> said =>
>> 1.864911184196927
>> dlrs =>
>> 1.199286689822081
>> mln =>
>> 1.1802134783562215
>> pct =>
>> 1.1529704214798124
>> its =>
>> 1.1184398851519701
>> from =>
>> 1.016647848050332
>> company =>
>> 0.894703604722841
>> mar =>
>> 0.879986159541356
>> has =>
>> 0.8642799128491316
>> year =>
>> 0.8271823503717782
>> inc =>
>> 0.7871293745341424
>> corp =>
>> 0.737705498468879
>> which =>
>> 0.722975201852743
>> would =>
>> 0.708000816484415
>> u.s =>
>> 0.7073294276173905
>> billion =>
>> 0.7055723996916351
>> he =>
>> 0.7042684217823294
>> new =>
>> 0.6834737905434939
>> shares =>
>> 0.6753327384172428
>> stock =>
>> 0.6576225144041699
>> :SV-4785{n=1044 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
>> 0.01:0.006, 0.02:0.002, 0.02
>> Top Terms:
>> said =>
>> 1.8796076179735086
>> its =>
>> 1.172025965452378
>> dlrs =>
>> 1.130422792460914
>> pct =>
>> 1.082038255241358
>> mln =>
>> 1.0772146872767114
>> company =>
>> 0.9662235879639138
>> from =>
>> 0.9473172871605616
>> has =>
>> 0.9224712965830099
>> mar =>
>> 0.8769325856924421
>> inc =>
>> 0.8360245257169788
>> shares =>
>> 0.8334595641384324
>> stock =>
>> 0.7704621839612175
>> corp =>
>> 0.7682400250301806
>> which =>
>> 0.7389988207856137
>> would =>
>> 0.7339708917389389
>> year =>
>> 0.7088414843731325
>> new =>
>> 0.7038109468655172
>> he =>
>> 0.6993994455501005
>> u.s =>
>> 0.6772649147622415
>> share =>
>> 0.6241804830055171
>>
>> *lda:*
>>
>> [snip]
>> 21539
>> {0.02:0.7071698119320297,0.01:0.09185661419250732,0.055:0.05559716236030052,0:0.04416624482186779,0.046:0.04110903741109505,0.10:0.036664417789754995,0.073:0.009543544335363714,0.006913:0.0050293341354450535,0.007050:0.004031353935388081,0.003:0.0019163134919350053}
>> 21540
>> {0.04:0.4289840457590362,0.006913:0.3764552842292379,0.03:0.14201700033942147,0.025:0.031492533562460345,0.01:0.0057428959027208,0.0625:0.005563615996288134,0.007050:0.004410121345864598,0.02:0.003315679992182833,0.06:0.0010680133665222197,0.057:2.6908116668663575E-4}
>> 21541
>> {0:0.6323790020346274,0.02:0.282038438551802,0.06:0.05929917592183956,0.046:0.021922159161082488,0.025:0.003123559949176154,0.1:6.163861804777446E-4,0.057:1.1513607281652563E-4,0.077:7.546641269535658E-5,0.05:6.020069105221075E-5,0.04:5.727546417602751E-5}
>> 21542
>> {0.06:0.7916202902993545,0.003:0.12474538653424426,0.03:0.0516187501990535,0.073:0.013078206873952192,0.077:0.006874558859394474,0.1:0.004726257215175674,0.07:0.0029954110394770084,0.02:0.0015535221634909381,0.0625:8.057802584933225E-4,0.006913:5.349608183182145E-4}
>> 21543
>> {0.03:0.8557703918728101,0.007050:0.0884696514495358,0.01:0.02386217742025169,0.055:0.01794750983209314,0.046:0.007244240662980594,0.02:0.0025021610305199344,0.1:0.0011795344808501283,0.0625:9.555718731181631E-4,0.077:8.384080940465597E-4,0.003:2.4480111447900804E-4}
>> 21544
>> {0.006913:0.6497351267772595,0.03:0.2586364130498913,0.003:0.0845308091084703,0.05:0.003488921831506818,0.0625:0.0011991059579690732,0.06:5.301659535652308E-4,0.055:3.152013670552699E-4,0.025:3.11949418681945E-4,0.02:2.2346509541410544E-4,0.057:1.727741085612565E-4}
>> 21545
>> {0.07:0.24937001680778675,0.03:0.22854680579732564,0.1:0.22068070811382484,0.05:0.21891358916881656,0.06:0.04971205462818302,0.046:0.029384231252419384,0.025:0.0014118858333346275,0.0625:0.0012321257282372393,0.073:3.1606402052550454E-4,0.04:1.1662985389192366E-4}
>> 21546
>> {0.006913:0.652435612545713,0.073:0.24282539253890825,0.003:0.055020663101050894,0.046:0.01978650831133704,0.04:0.011699600233998459,0.02:0.006822518565048025,0.05:0.0042733514779345234,0.025:0.0020768247329981497,0.03:0.0015466188524926666,0:0.0010433788491090747}
>> 21547
>> {0.1:0.935072008706917,0.077:0.040616382672055494,0.046:0.023138054104883225,0.06:3.42252302356326E-4,0.03:2.672247473681925E-4,0.007050:9.004419280457053E-5,0.006913:7.844271097106069E-5,0.04:7.198835322717653E-5,0.073:5.8017696474244885E-5,0.02:4.474316852571109E-5}
>> 21548
>> {0.006913:0.4510141547325999,0.03:0.21472666934984877,0.07:0.15060570743672352,0.046:0.08216181853028293,0.05:0.07498297963542139,0.077:0.01401215532342401,0.04:0.006135722806477439,0.073:0.0031124469556872442,0.02:0.001327252389650958,0.025:4.5167404646311124E-4}
>> 21549
>> {0.077:0.5249260290096315,0:0.27520186965742544,0.073:0.08959570186504386,0.057:0.05063809804337512,0.02:0.045313417578343,0.03:0.00856024297303885,0.06:0.0034039136814002697,0.07:8.611323331226122E-4,0.05:3.9387255234958607E-4,0.04:3.454752498776842E-4}
>> 21550
>> {0.077:0.8319708524327014,0.007050:0.16001430652201168,0.0625:0.0024884528530674226,0.03:0.0024228964094551416,0.006913:0.0010048446080994332,0.025:5.974607168723063E-4,0.06:2.7741593377792194E-4,0.057:2.2975316663437597E-4,0.073:2.1769059479546598E-4,0.01:1.4945234676270913E-4}
>> 21551
>> {0:0.43298549949426596,0.07:0.30407459462158964,0.0625:0.15744077685124136,0.01:0.06385977925647884,0.025:0.020205557109867888,0.04:0.013226123091680062,0.055:0.006501562964287696,0.073:5.871897881404074E-4,0.003:4.554836644848264E-4,0.05:2.0489216962221778E-4}
>> 21552
>> {0.06:0.7383744333942458,0.02:0.07321126682102753,0.07:0.05910378841288311,0.10:0.056907223730939045,0:0.02739559786902668,0.055:0.02228913751272657,0.1:0.00943274247398869,0.073:0.007301445750018608,0.03:0.0027711985062277246,0.046:0.0022569760697531112}
>> 21553
>> {0.006913:0.9383779606743132,0.07:0.018356074893823605,0.1:0.017572502072403694,0.025:0.011031848455352145,0.02:0.007110054905474798,0.10:0.0032390933761145377,0.01:0.0022676909091530165,0.06:6.35712654913703E-4,0.003:3.037608224814005E-4,0.0625:2.569989005512836E-4}
>> 21554
>> {0.02:0.4740260172915081,0.03:0.3264871353578654,0.003:0.12383071192450323,0:0.0387913487693229,0.046:0.013922363892934853,0.01:0.01115832101415319,0.055:0.004438529958216764,0.006913:0.0031112721201723413,0.025:0.002399946628810436,0.073:6.131643125523258E-4}
>> 21555
>> {0.01:0.42396888624053325,0.057:0.2876207419392007,0.055:0.13436747721404915,0.10:0.0923627978966975,0.05:0.030530372962547347,0.003:0.008404234144369286,0.006913:0.007605013189108045,0.007050:0.005108825532219537,0.0625:0.004651573366090424,0.025:0.002180181204608079}
>> 21556
>> {0.02:0.9634685201058687,0.077:0.02816791882994464,0.003:0.007368001305747207,0.01:4.6299664334060455E-4,0.055:6.358811991095171E-5,0.06:5.5240076581247115E-5,0.03:5.231350505975146E-5,0.073:4.4773553407989E-5,0:3.6810457882911134E-5,0.006913:3.469265861451538E-5}
>> 21557
>> {0.06:0.9993947051352264,0.05:2.53296407549323E-4,0.03:9.647254237540585E-5,0.04:4.3439270433017595E-5,0.006913:3.354681491698619E-5,0.046:2.8586727566416525E-5,0.007050:2.5584606074089293E-5,0.02:2.2477243875316502E-5,0:1.9194783598477564E-5,0.073:1.6900267481892075E-5}
>> 21558
>> {0.06:0.9985073788555696,0.03:8.033091100049725E-4,0.02:1.1953766739610202E-4,0.1:9.257610652171745E-5,0.006913:6.903783269939522E-5,0.04:5.7960967280609926E-5,0.07:5.5767050956214925E-5,0.046:4.857027337508731E-5,0.007050:3.5964741541364354E-5,0.057:3.538185773175377E-5}
>> 21559
>> {0.006913:0.5411158302162348,0.073:0.10595054605908563,0.04:0.10013413310674449,0.007050:0.08891690362990352,0.003:0.08028744789933502,0.03:0.07779866329563544,0.025:0.0024314950846438975,0.0625:0.0018111845757907532,0.077:5.019763638644379E-4,0.055:2.6705221435486376E-4}
>> 21560
>> {0.06:0.9978884964462115,0.03:9.139179655096763E-4,0.02:6.066525825847198E-4,0.04:3.2209695809936266E-4,0.006913:7.970816960964983E-5,0.007050:5.58984012266038E-5,0.046:1.545206468939752E-5,0.077:1.5319891034946036E-5,0:1.5223193391279898E-5,0.073:1.1687759018690983E-5}
>> 21561
>> {0.06:0.8704787441462888,0.007050:0.09131090501970876,0.0625:0.03778566273654969,0.046:8.111987924033587E-5,0.01:6.397000919080148E-5,0.1:4.979138875277178E-5,0.073:2.7942853854174412E-5,0.03:2.762357950066146E-5,0.077:2.6820742114533983E-5,0.025:2.6208487691114472E-5}
>> 21562
>> {0.06:0.9625603310705717,0.02:0.01708933716171641,0.007050:0.011374975845817934,0.07:0.00482114999912725,0.057:0.003310531318631415,0.077:2.568457462720504E-4,0.025:1.3439876933758153E-4,0.1:1.0512433283405881E-4,0:8.737443941062146E-5,0.046:5.9284232059258864E-5}
>> 21563
>> {0.06:0.9996809273982157,0.046:1.6325482120709976E-4,0.02:2.1084598024339765E-5,0.006913:1.7901242019979392E-5,0.04:1.3991162886383784E-5,0.03:1.3682157013015017E-5,0.077:1.3602492456590212E-5,0.007050:1.206712606141859E-5,0.1:1.0231842284246997E-5,0.057:7.704725064510759E-6}
>> 21564
>> {0.06:0.9997861963976675,0.03:2.7140899197995276E-5,0.077:2.4456158311423386E-5,0.04:1.7952818233846462E-5,0.02:1.6334455815684533E-5,0.006913:1.4910843270220926E-5,0.073:1.4893553937733922E-5,0.1:1.2545291899719683E-5,0.007050:1.2337617858874285E-5,0.07:1.121302251254011E-5}
>> 21565
>> {0.006913:0.5047832315246878,0.007050:0.2502505818382197,0.04:0.09937533960784072,0.03:0.05332716291468396,0.0625:0.035738976624857435,0.05:0.023139962103851885,0.1:0.01510786357969295,0.025:0.01294816540331917,0.06:0.003589347327961106,0.073:4.8607269023994543E-4}
>> 21566
>> {0.03:0.865579490292393,0.073:0.065013560785593,0.077:0.056622094108767465,0.046:0.006811842330071251,0.057:0.0021561477140846267,0.01:0.0013149375957061502,0.04:7.456782721333958E-4,0.05:6.412995789267404E-4,0:2.107021879325011E-4,0.007050:1.8121393989190674E-4}
>> 21567
>> {0.077:0.6528663315309344,0.03:0.2794089480653573,0.025:0.060214953606503134,0.003:0.0029851203917978303,0.01:0.002840784719750811,0.007050:5.508511345707982E-4,0.02:3.3395642786457786E-4,0.07:1.7452781529689483E-4,0.055:1.541344869853217E-4,0.046:1.0154945247629696E-4}
>> 21568
>> {0.057:0.7925855379581803,0.03:0.2036907633660934,0.06:0.002283924010657722,0.046:0.0011125177332923534,0.10:6.706442563331911E-5,0.02:3.9070227131596934E-5,0.07:3.266094677087569E-5,0.1:2.3436639268605713E-5,0.077:2.318067689954084E-5,0.006913:2.273173553155518E-5}
>> 21569
>> {0.06:0.7223525910216753,0.1:0.22280373045161775,0.04:0.04239924319412595,0.02:0.006529556700876843,0.007050:0.004661124794787862,0.10:2.017974034648702E-4,0.05:2.017516118028694E-4,0.025:1.583677755896652E-4,0.006913:1.1939663934259253E-4,0:8.736457986006156E-5}
>> 21570
>> {0.073:0.5321098304788365,0.006913:0.3897385574581158,0.02:0.051125703897749404,0.077:0.012417023254098358,0.01:0.006491518762040415,0.03:0.005434610908750246,0.055:9.352347087701305E-4,0.06:5.239363525283659E-4,0.10:4.4114444135088393E-4,0.003:2.2360302221231853E-4}
>> 21571
>> {0.06:0.9074255414695478,0.05:0.08971808812931319,0.02:0.0019124104766371694,0.1:2.581234320438502E-4,0.073:1.9041498001195312E-4,0.046:8.348355382104383E-5,0.006913:5.748062878632201E-5,0.057:4.8979196235823963E-5,0.04:4.4005272512340306E-5,0.03:4.1089625609562384E-5}
>> 21572
>> {0.06:0.7216888928389846,0.04:0.1955684645266304,0.006913:0.08235454447065854,0.1:9.82899034505965E-5,0.046:4.9184577303445956E-5,0.05:4.021965070167039E-5,0.007050:3.081280652634891E-5,0.073:2.791145099471127E-5,0.02:1.95681856192452E-5,0:1.9037190007395713E-5}
>> 21573
>> {0.05:0.8568971411565196,0.046:0.12909436895238377,0.06:0.012704234652048044,0.04:2.472784666357729E-4,0.055:2.1886698996361582E-4,0.1:1.1974451444757112E-4,0.0625:9.082357988309755E-5,0.07:9.030017229129562E-5,0.03:7.268045692763623E-5,0.073:6.607926928741721E-5}
>> 21574
>> {0.046:0.5619466458628039,0.006913:0.184782367089353,0.0625:0.09726566772972363,0.003:0.09534816862353344,0.02:0.015692927163565275,0.073:0.015492017672231727,0.01:0.01411218625979968,0.007050:0.01037341031640615,0.055:0.0020124341216292752,0:0.0013828685922332715}
>> 21575
>> {0.05:0.3167681189235041,0.06:0.2879333280436204,0.046:0.22584628506521745,0.003:0.15994241628395953,0.0625:0.006482994028630967,0.1:0.0016773104050919493,0.055:4.8725015996676173E-4,0.03:2.432893903382962E-4,0.025:1.6015914759364425E-4,0.006913:1.0021292427951807E-4}
>> 21576
>> {0.077:0.2407816981967022,0.003:0.18594668103110193,0.10:0.14951775492012523,0.0625:0.09241152906714677,0.007050:0.08997645163280943,0.057:0.07102865286733068,0.055:0.048046579920457584,0.05:0.03776387140040494,0.073:0.033355081179026046,0.006913:0.015980584385115525}
>> 21577
>> {0.06:0.9599074612361259,0.1:0.03694135499501186,0.04:0.0021941315048273186,0.057:4.7070027358666304E-4,0.02:1.8231606308803002E-4,0.0625:6.234942557920162E-5,0.006913:5.159011604129561E-5,0.046:3.9832812943124216E-5,0.007050:3.0380683715134534E-5,0.05:2.8920851352755496E-5}
>>
>>
>> *Streaming k-means:*
>>
>> [snip]
>> INFO: Number of Centroids: 0
>> Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
>> WARNING: job_local23982482_0001
>> java.lang.IllegalArgumentException: Must have nonzero number of training
>> and test vectors. Asked for %.1f %% of %d vectors for test
>> [10.000000149011612, 0]
>> at
>> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
>> at
>> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
>> at
>> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
>> at
>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
>> at
>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
>> at
>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
>> at
>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
>>
>> [snip]
>>
>> WARNING: No qualcluster.props found on classpath, will use command-line
>> arguments only
>> Num clusters: 0; maxDistance: 0.000000
>> [Dunn Index] First: Infinity
>> [Davies-Bouldin Index] First: NaN
>> Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Program took 535 ms (Minutes: 0.008916666666666666)
>> cluster,distance.mean,distance.sd
>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>>
>>
>> On Tue, Jan 21, 2014 at 1:47 PM, Andrew Musselman <
>> andrew.musselman@gmail.com> wrote:
>>
>>> *classify-20newsgroups.sh*
>>>
>>> *Complementary naive bayes:*
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances : 11207 98.9406%
>>> Incorrectly Classified Instances : 120 1.0594%
>>> Total Classified Instances : 11327
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a b c d e f g h i
>>> j k l m n o p q r s
>>> t <--Classified as
>>> 475 0 0 1 0 0 0 0 0
>>> 0 0 0 0 0 1 0 1 0 0
>>> 0 | 478 a = alt.atheism
>>> 0 597 1 1 0 1 1 0 0
>>> 0 0 1 0 2 1 0 0 0 0
>>> 0 | 605 b = comp.graphics
>>> 0 1 620 3 0 1 0 0 0
>>> 0 0 1 0 0 1 0 0 0 0
>>> 0 | 627 c = comp.os.ms-windows.misc
>>> 1 1 1 593 2 0 0 0 0
>>> 0 0 0 0 0 0 1 0 0 0
>>> 0 | 599 d = comp.sys.ibm.pc.hardware
>>> 0 1 1 0 568 0 1 0 0
>>> 0 1 1 2 0 0 0 0 1 0
>>> 0 | 576 e = comp.sys.mac.hardware
>>> 0 4 2 0 0 581 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0
>>> 0 | 587 f = comp.windows.x
>>> 0 0 0 1 2 0 571 3 0
>>> 0 1 1 4 1 0 0 0 0 0
>>> 0 | 584 g = misc.forsale
>>> 0 0 0 1 0 0 0 589 1
>>> 0 0 1 1 0 0 0 0 0 0
>>> 0 | 593 h = rec.autos
>>> 0 0 0 0 0 0 0 1 565
>>> 0 0 0 0 0 1 0 0 0 0
>>> 0 | 567 i = rec.motorcycles
>>> 0 0 0 0 0 0 0 0 0
>>> 600 2 0 0 0 1 0 0 0 0
>>> 0 | 603 j = rec.sport.baseball
>>> 0 0 0 0 0 0 0 0 0
>>> 1 584 0 0 0 0 0 0 0 0
>>> 0 | 585 k = rec.sport.hockey
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 579 0 0 0 0 0 1 0
>>> 0 | 580 l = sci.crypt
>>> 0 0 0 1 3 0 2 0 0
>>> 2 0 0 567 1 2 1 0 0 0
>>> 0 | 579 m = sci.electronics
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 0 1 605 0 0 0 0 0
>>> 0 | 606 n = sci.med
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 602 0 0 0 0
>>> 0 | 602 o = sci.space
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 1 0 602 0 0 1
>>> 0 | 604 p = soc.religion.christian
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 556 0 0
>>> 0 | 556 q = talk.politics.mideast
>>> 0 0 1 0 0 0 0 0 0
>>> 0 0 1 0 0 1 0 0 568 0
>>> 0 | 571 r = talk.politics.guns
>>> 11 0 0 0 0 0 0 0 0
>>> 1 0 0 0 1 3 8 1 4 338
>>> 2 | 369 s = talk.religion.misc
>>> 0 0 0 0 0 0 0 0 0
>>> 0 1 0 0 0 1 0 3 4 0
>>> 447 | 456 t = talk.politics.misc
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa 0.9806
>>> Accuracy 98.9406%
>>> Reliability 94.0932%
>>> Reliability (standard deviation) 0.2163
>>>
>>> Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 15870 ms (Minutes: 0.2645)
>>> + echo 'Testing on holdout set'
>>> Testing on holdout set
>>> + ./bin/mahout testnb -i /tmp/mahout-work-ec2-user/20news-test-vectors
>>> -m /tmp/mahout-work-ec2-user/model -l /tmp/mahout-work-ec2-user/labelindex
>>> -ow -o /tmp/mahout-work-ec2-user/20news-testing -c
>>>
>>> [snip]
>>>
>>> INFO: Complementary Results:
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances : 6715 89.3071%
>>> Incorrectly Classified Instances : 804 10.6929%
>>> Total Classified Instances : 7519
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a b c d e f g h i
>>> j k l m n o p q r s
>>> t <--Classified as
>>> 298 0 0 0 0 0 0 0 0
>>> 1 0 0 0 1 2 5 1 0 13
>>> 0 | 321 a = alt.atheism
>>> 0 298 11 6 1 12 2 2 1
>>> 1 3 8 3 4 2 4 1 4 4
>>> 1 | 368 b = comp.graphics
>>> 1 17 286 16 4 9 6 3 2
>>> 0 1 0 1 7 1 0 2 1 0
>>> 1 | 358 c = comp.os.ms-windows.misc
>>> 2 6 11 309 9 5 14 8 1
>>> 0 2 0 6 4 2 0 1 2 1
>>> 0 | 383 d = comp.sys.ibm.pc.hardware
>>> 0 10 8 7 334 7 5 5 2
>>> 0 3 0 2 1 1 0 1 1 0
>>> 0 | 387 e = comp.sys.mac.hardware
>>> 1 13 7 8 2 355 2 0 2
>>> 0 0 5 1 1 3 0 0 1 0
>>> 0 | 401 f = comp.windows.x
>>> 0 7 11 29 12 9 268 16 8
>>> 4 3 2 6 4 2 1 3 1 2
>>> 3 | 391 g = misc.forsale
>>> 0 1 0 0 3 0 7 362 8
>>> 2 2 1 2 0 2 0 1 2 0
>>> 4 | 397 h = rec.autos
>>> 0 0 0 1 0 0 1 0 423
>>> 0 0 0 2 1 0 1 0 0 0
>>> 0 | 429 i = rec.motorcycles
>>> 0 0 1 0 0 0 0 2 2
>>> 371 8 0 2 3 0 2 0 0 0
>>> 0 | 391 j = rec.sport.baseball
>>> 0 0 1 0 0 0 1 0 0
>>> 2 409 0 0 0 0 0 0 0 0
>>> 1 | 414 k = rec.sport.hockey
>>> 0 0 1 2 1 0 1 0 0
>>> 0 0 404 0 0 0 0 0 1 0
>>> 1 | 411 l = sci.crypt
>>> 0 5 4 11 1 3 7 9 2
>>> 5 3 3 339 2 6 0 1 1 2
>>> 1 | 405 m = sci.electronics
>>> 0 4 0 1 0 0 0 1 0
>>> 1 1 0 3 367 3 1 2 0 0
>>> 0 | 384 n = sci.med
>>> 0 1 2 0 1 0 2 0 0
>>> 1 0 0 1 1 375 0 1 0 0
>>> 0 | 385 o = sci.space
>>> 4 2 1 1 0 0 1 1 2
>>> 0 0 1 1 5 1 367 4 0 1
>>> 1 | 393 p = soc.religion.christian
>>> 0 1 0 0 0 0 0 0 0
>>> 2 0 0 0 0 0 2 378 0 1
>>> 0 | 384 q = talk.politics.mideast
>>> 0 0 0 0 0 2 1 1 1
>>> 1 0 3 0 3 0 0 2 319 2
>>> 4 | 339 r = talk.politics.guns
>>> 32 0 0 1 0 0 0 0 0
>>> 1 1 1 0 2 2 26 5 7 175
>>> 6 | 259 s = talk.religion.misc
>>> 0 0 0 2 0 0 0 0 0
>>> 1 2 2 0 1 2 1 10 18 2
>>> 278 | 319 t = talk.politics.misc
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa 0.8594
>>> Accuracy 89.3071%
>>> Reliability 84.611%
>>> Reliability (standard deviation) 0.2148
>>>
>>> Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 10840 ms (Minutes: 0.18066666666666667)
>>>
>>>
>>> *Naive bayes:*
>>> INFO: Standard NB Results:
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances : 11286 99.0869%
>>> Incorrectly Classified Instances : 104 0.9131%
>>> Total Classified Instances : 11390
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a b c d e f g h i
>>> j k l m n o p q r s
>>> t <--Classified as
>>> 474 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 2
>>> 1 | 477 a = alt.atheism
>>> 0 566 0 2 0 1 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0
>>> 0 | 569 b = comp.graphics
>>> 0 10 590 29 2 4 1 0 0
>>> 0 0 0 1 0 0 0 0 0 0
>>> 1 | 638 c = comp.os.ms-windows.misc
>>> 0 0 0 596 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0
>>> 0 | 596 d = comp.sys.ibm.pc.hardware
>>> 0 0 0 0 575 0 1 0 0
>>> 0 0 0 1 0 0 0 0 0 0
>>> 0 | 577 e = comp.sys.mac.hardware
>>> 0 2 2 2 0 593 1 0 0
>>> 0 0 0 0 0 1 0 0 0 0
>>> 0 | 601 f = comp.windows.x
>>> 0 0 0 1 0 0 589 1 0
>>> 0 1 0 2 0 0 0 0 0 0
>>> 0 | 594 g = misc.forsale
>>> 0 0 0 0 0 0 0 594 0
>>> 0 0 0 0 0 0 0 0 0 0
>>> 0 | 594 h = rec.autos
>>> 0 0 0 0 0 0 0 0 611
>>> 0 0 0 0 0 0 0 0 0 0
>>> 0 | 611 i = rec.motorcycles
>>> 0 0 0 0 0 0 0 0 0
>>> 616 1 0 0 0 0 0 0 0 0
>>> 0 | 617 j = rec.sport.baseball
>>> 0 0 0 0 0 0 1 0 0
>>> 0 620 0 0 0 0 0 0 0 0
>>> 0 | 621 k = rec.sport.hockey
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 580 0 0 0 0 0 1 0
>>> 0 | 581 l = sci.crypt
>>> 0 0 0 3 1 0 0 0 0
>>> 0 0 0 571 0 0 0 0 0 0
>>> 0 | 575 m = sci.electronics
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 0 2 583 0 0 0 0 0
>>> 0 | 585 n = sci.med
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 1 599 0 0 0 0
>>> 0 | 600 o = sci.space
>>> 0 1 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 615 0 0 0
>>> 0 | 616 p = soc.religion.christian
>>> 1 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 1 560 0 0
>>> 0 | 562 q = talk.politics.mideast
>>> 0 0 1 0 0 0 0 0 0
>>> 0 0 1 0 0 0 0 0 548 0
>>> 1 | 551 r = talk.politics.guns
>>> 10 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 1 1 0 2 344
>>> 1 | 359 s = talk.religion.misc
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 1 1 0 0 0 0 2 0
>>> 462 | 466 t = talk.politics.misc
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa 0.9847
>>> Accuracy 99.0869%
>>> Reliability 94.3334%
>>> Reliability (standard deviation) 0.2169
>>>
>>> Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 14304 ms (Minutes: 0.2384)
>>> + echo 'Testing on holdout set'
>>> Testing on holdout set
>>>
>>> [snip]
>>>
>>> INFO: Standard NB Results:
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances : 6718 90.1019%
>>> Incorrectly Classified Instances : 738 9.8981%
>>> Total Classified Instances : 7456
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a b c d e f g h i
>>> j k l m n o p q r s
>>> t <--Classified as
>>> 294 0 0 0 0 0 0 0 0
>>> 0 0 2 0 1 1 6 1 1 16
>>> 0 | 322 a = alt.atheism
>>> 0 345 6 14 6 11 6 0 0
>>> 0 0 5 7 1 3 0 0 0 0
>>> 0 | 404 b = comp.graphics
>>> 2 29 177 78 22 19 9 1 0
>>> 0 0 4 2 0 1 1 0 0 1
>>> 1 | 347 c = comp.os.ms-windows.misc
>>> 1 9 2 335 18 2 10 0 0
>>> 0 1 0 8 0 0 0 0 0 0
>>> 0 | 386 d = comp.sys.ibm.pc.hardware
>>> 1 4 2 13 347 3 5 1 0
>>> 0 1 0 7 1 0 0 0 1 0
>>> 0 | 386 e = comp.sys.mac.hardware
>>> 0 20 0 4 0 352 4 0 0
>>> 0 0 0 1 1 3 0 1 0 1
>>> 0 | 387 f = comp.windows.x
>>> 0 2 0 21 5 1 323 7 2
>>> 2 0 2 12 0 3 0 0 0 0
>>> 1 | 381 g = misc.forsale
>>> 0 1 0 0 1 0 15 363 8
>>> 1 0 0 4 1 0 0 0 1 0
>>> 1 | 396 h = rec.autos
>>> 0 1 0 0 0 0 6 6 370
>>> 0 0 0 0 1 0 0 0 0 1
>>> 0 | 385 i = rec.motorcycles
>>> 1 0 0 1 1 0 2 1 2
>>> 362 5 0 2 0 0 0 0 0 0
>>> 0 | 377 j = rec.sport.baseball
>>> 0 0 0 1 2 0 0 0 0
>>> 3 371 0 0 0 0 0 0 0 0
>>> 1 | 378 k = rec.sport.hockey
>>> 0 3 1 0 1 0 2 0 0
>>> 0 0 396 0 1 0 0 1 1 1
>>> 3 | 410 l = sci.crypt
>>> 0 7 0 7 7 2 6 4 0
>>> 0 0 1 369 2 2 0 0 0 0
>>> 2 | 409 m = sci.electronics
>>> 0 3 0 2 1 0 2 0 0
>>> 0 0 1 4 383 4 0 0 1 0
>>> 4 | 405 n = sci.med
>>> 0 5 0 0 1 0 3 0 0
>>> 0 0 0 1 0 374 1 0 0 1
>>> 1 | 387 o = sci.space
>>> 6 2 0 1 1 0 0 1 0
>>> 1 0 0 1 5 0 352 2 1 7
>>> 1 | 381 p = soc.religion.christian
>>> 1 1 0 0 0 0 0 0 0
>>> 0 1 0 0 0 0 0 373 1 0
>>> 1 | 378 q = talk.politics.mideast
>>> 0 0 0 0 0 0 1 0 1
>>> 0 0 2 0 0 0 0 0 346 2
>>> 7 | 359 r = talk.politics.guns
>>> 26 1 0 1 0 0 0 2 0
>>> 1 1 0 0 1 1 20 2 6 200
>>> 7 | 269 s = talk.religion.misc
>>> 1 0 0 0 0 0 0 2 0
>>> 0 1 0 0 2 2 0 1 14 0
>>> 286 | 309 t = talk.politics.misc
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa 0.8726
>>> Accuracy 90.1019%
>>> Reliability 85.4491%
>>> Reliability (standard deviation) 0.2222
>>>
>>> Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 10878 ms (Minutes: 0.1813)
>>>
>>> *SGD:*
>>> 7532 test files
>>>
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances : 5649 75%
>>> Incorrectly Classified Instances : 1883 25%
>>> Total Classified Instances : 7532
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a b c d e f g h i
>>> j k l m n o p q r s
>>> t <--Classified as
>>> 186 6 3 10 5 0 33 4 13
>>> 15 7 1 24 15 3 15 5 5 29
>>> 15 | 394 a = sci.space
>>> 5 309 0 3 2 5 0 0 0
>>> 1 9 21 2 0 0 18 4 4 1
>>> 1 | 385 b = comp.sys.mac.hardware
>>> 4 1 101 3 0 1 63 0 7
>>> 0 1 1 5 16 3 0 3 7 1
>>> 34 | 251 c = talk.religion.misc
>>> 11 12 1 265 1 10 3 0 0
>>> 17 10 11 5 2 0 11 3 6 21
>>> 0 | 389 d = comp.graphics
>>> 2 1 1 0 349 2 3 0 3
>>> 2 6 1 5 1 0 2 15 2 1
>>> 2 | 398 e = rec.motorcycles
>>> 7 20 3 19 2 254 6 0 2
>>> 11 2 39 7 2 0 4 2 2 9
>>> 3 | 394 f = comp.os.ms-windows.misc
>>> 2 1 13 0 0 0 247 0 1
>>> 1 3 0 6 2 4 0 2 3 5
>>> 29 | 319 g = alt.atheism
>>> 1 1 0 0 2 0 2 361 0
>>> 1 2 0 2 0 0 1 3 22 0
>>> 1 | 399 h = rec.sport.hockey
>>> 3 0 3 1 0 0 5 0 161
>>> 0 1 2 12 102 0 0 1 2 11
>>> 6 | 310 i = talk.politics.misc
>>> 2 8 0 19 0 19 0 0 1
>>> 294 10 11 4 2 0 5 0 3 11
>>> 6 | 395 j = comp.windows.x
>>> 2 10 0 1 1 0 0 0 0
>>> 1 347 13 2 1 0 5 3 2 2
>>> 0 | 390 k = misc.forsale
>>> 1 36 0 6 1 25 0 0 1
>>> 6 10 257 2 1 0 34 6 0 6
>>> 0 | 392 l = comp.sys.ibm.pc.hardware
>>> 2 2 2 2 1 0 12 0 0
>>> 6 10 4 312 5 2 13 11 3 3
>>> 6 | 396 m = sci.med
>>> 2 0 3 2 1 0 0 1 13
>>> 0 5 1 2 314 2 0 2 2 10
>>> 4 | 364 n = talk.politics.guns
>>> 1 0 2 1 1 0 34 1 33
>>> 1 3 0 1 8 271 1 4 5 6
>>> 3 | 376 o = talk.politics.mideast
>>> 3 14 0 8 2 8 3 1 1
>>> 7 12 29 6 2 1 245 13 2 32
>>> 4 | 393 p = sci.electronics
>>> 3 3 0 2 11 0 1 0 2
>>> 1 11 6 4 2 0 11 330 4 4
>>> 1 | 396 q = rec.autos
>>> 0 0 1 0 1 0 4 12 3
>>> 1 3 0 0 0 0 5 6 359 1
>>> 1 | 397 r = rec.sport.baseball
>>> 0 1 0 0 0 1 0 0 3
>>> 3 0 0 3 2 1 6 1 6 366
>>> 3 | 396 s = sci.crypt
>>> 0 2 11 1 1 0 40 0 1
>>> 2 3 4 2 1 0 5 0 2 2
>>> 321 | 398 t = soc.religion.christian
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa 0.7073
>>> Accuracy 75%
>>> Reliability 70.6238%
>>> Reliability (standard deviation) 0.2187
>>> Log-likelihood mean : -1.1182
>>> 25%-ile : -1.6911
>>> 75%-ile : -0.0803
>>>
>>> Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 10783 ms (Minutes: 0.17971666666666666)
>>>
>>>
>>>
>>>
>>> On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <su...@yahoo.com>wrote:
>>>
>>>> Thanks Andrew for reporting that. I rolled back the release to fix this
>>>> and few other issues.
>>>>
>>>> We have removed asf-examples*.sh from trunk as the sample file at the
>>>> url mentioned in ur email is not available.
>>>> This is something we need to fix and restore in 1.0.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <
>>>> ap.dev@outlook.com> wrote:
>>>>
>>>> from the asf-email-examples.sh script:
>>>>
>>>> # You will need to download or otherwise obtain some or all of the
>>>> Amazon ASF Em
>>>> ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566)
>>>> to use this
>>>> script.
>>>> # To obtain a full copy you will need to launch an EC2 instance and
>>>> mount the da
>>>> taset to download it, otherwise you can get a sample of it at
>>>> #
>>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>>>>
>>>> It looks like the:
>>>>
>>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>>>>
>>>> link is down.
>>>>
>>>> Is there somewhere else that we can get a subset of the ASF emails?
>>>>
>>>>
>>>>
>>>> Date: Tue, 21 Jan 2014 09:48:06 -0800
>>>> > Subject: Re: MAHOUT 0.9 Release - New URL
>>>> > From: andrew.musselman@gmail.com
>>>> > To: dev@mahout.apache.org
>>>> >
>>>> > Sure thing; continuing to smoke test the other examples tonight
>>>> >
>>>> >
>>>> > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <
>>>> suneel_marthi@yahoo.com>wrote:
>>>> >
>>>> > > Thanks Andrew M., see that some of the example scripts need to be
>>>> fixed as
>>>> > > they still refer to the deprecated algorithms.
>>>> > > See that the Streaming KMeans has failed for you as well.
>>>> > >
>>>> > > I'll be rolling back the release today to fix these issues.
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
>>>> > > andrew.musselman@gmail.com> wrote:
>>>> > >
>>>> > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default
>>>> 64-bit
>>>> > > Linux AMI from tarball.
>>>> > >
>>>> > > All tests pass.
>>>> > >
>>>> > > *Output of examples:*
>>>> > > *asf-email-examples.sh, run on mahout.apache.org
>>>> > > <http://mahout.apache.org>:*
>>>> > > *recommendations:*
>>>> > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
>>>> > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000 | less
>>>> > > 1
>>>> > >
>>>> > >
>>>> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
>>>> > > 4
>>>> > >
>>>> > >
>>>> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
>>>> > > 6
>>>> > >
>>>> > >
>>>> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
>>>> > > 8
>>>> > > [12758:1.0,19409:1.0,11112:1.0]
>>>> > > 11
>>>> > >
>>>> > >
>>>> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
>>>> > > 14
>>>> > >
>>>> > >
>>>> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
>>>> > > 15
>>>> > >
>>>> > >
>>>> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
>>>> > > 16
>>>> > >
>>>> > >
>>>> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
>>>> > > 18
>>>> > >
>>>> > >
>>>> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
>>>> > > 19 [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
>>>> > > 20
>>>> > >
>>>> > >
>>>> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
>>>> > > [snip]
>>>> > >
>>>> > > *clustering; kmeans:*
>>>> > > [snip]
>>>> > > Weight : [props - optional]: Point:
>>>> > > 1.0 :
>>>> > > [distance-squared=1.0193102046188427]:
>>>> > > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus=
>>>> > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
>>>> 7573:0.204,
>>>> > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093,
>>>> 9779:0.159,
>>>> > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
>>>> > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
>>>> > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
>>>> > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
>>>> > > 39789:0.110, 40743:0.190, 45775:0.086]
>>>> > > 1.0 : [distance-squared=0.9823018320457279]:
>>>> > > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus=
>>>> > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
>>>> 5336:0.106,
>>>> > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173,
>>>> 7832:0.072,
>>>> > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
>>>> > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
>>>> > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
>>>> > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
>>>> > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
>>>> > > 1.0 : [distance-squared=0.9509142993214911]:
>>>> > >
>>>> /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
>>>> > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
>>>> > > 4419:0.076,
>>>> > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056,
>>>> 7235:0.048,
>>>> > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123,
>>>> 7683:0.077,
>>>> > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
>>>> 10225:0.081,
>>>> > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
>>>> > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
>>>> > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
>>>> > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
>>>> > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
>>>> > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
>>>> > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
>>>> > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
>>>> > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
>>>> > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
>>>> > > 41280:0.065, 41696:0.072, 41947:0.118,
>>>> > > 43685:0.086, 44077:0.308,
>>>> > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
>>>> > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
>>>> > > [snip]
>>>> > >
>>>> > > *clustering; dirichlet:*
>>>> > > Get this complaint:
>>>> > > Running Dirichlet with K = 8
>>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
>>>> > > HADOOP_CONF_DIR=
>>>> > > MAHOUT-JOB:
>>>> > >
>>>> > >
>>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class:
>>>> dirichlet
>>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props
>>>> found on
>>>> > > classpath, will use command-line arguments only
>>>> > > Unknown program 'dirichlet' chosen.
>>>> > >
>>>> > > *clustering: minhash:*
>>>> > > Running Minhash
>>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
>>>> > > HADOOP_CONF_DIR=
>>>> > > MAHOUT-JOB:
>>>> > >
>>>> > >
>>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>>>> > > 14/01/21 05:17:27 WARN
>>>> > > driver.MahoutDriver: Unable to add class: minhash
>>>> > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found
>>>> on
>>>> > > classpath, will use command-line arguments only
>>>> > > Unknown program 'minhash' chosen.
>>>> > >
>>>> > > *classification; standard:*
>>>> > > =======================================================
>>>> > > Summary
>>>> > > -------------------------------------------------------
>>>> > > Correctly Classified Instances : 5384 87.7874%
>>>> > > Incorrectly Classified Instances : 749 12.2126%
>>>> > > Total Classified Instances : 6133
>>>> > >
>>>> > > =======================================================
>>>> > > Confusion Matrix
>>>> > > -------------------------------------------------------
>>>> > > a b c d
>>>> > > <--Classified as
>>>> > > 2949 7 531 25 | 3512 a = dev
>>>> > > 0 0 0 0 | 0 b = general
>>>> > > 99 8 1763 8 | 1878 c = user
>>>> > > 41 1 29 672 | 743 d = commits
>>>> > >
>>>> > > =======================================================
>>>> > > Statistics
>>>> > > -------------------------------------------------------
>>>> > > Kappa
>>>> > > 0.7877
>>>> > > Accuracy 87.7874%
>>>> > > Reliability 53.658%
>>>> > > Reliability (standard deviation) 0.4911
>>>> > >
>>>> > > *classification; complementary:*
>>>> > > =======================================================
>>>> > > Summary
>>>> > > -------------------------------------------------------
>>>> > > Correctly Classified Instances : 5530 90.1679%
>>>> > > Incorrectly Classified Instances : 603 9.8321%
>>>> > > Total Classified Instances :
>>>> > > 6133
>>>> > >
>>>> > > =======================================================
>>>> > > Confusion Matrix
>>>> > > -------------------------------------------------------
>>>> > > a b c d <--Classified as
>>>> > > 3168 0 276 68 | 3512 a = dev
>>>> > > 0 0 0 0 | 0 b = general
>>>> > > 196 0 1652 30 | 1878 c = user
>>>> > > 25 0 8 710 | 743 d =
>>>> > > commits
>>>> > >
>>>> > > =======================================================
>>>> > > Statistics
>>>> > > -------------------------------------------------------
>>>> > > Kappa 0.8259
>>>> > > Accuracy 90.1679%
>>>> > > Reliability 54.7459%
>>>> > > Reliability (standard deviation) 0.5005
>>>> > >
>>>> > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms
>>>> (Minutes:
>>>> > > 0.34836666666666666)
>>>> > >
>>>> > > *classification; sgd, with three categories:*
>>>> > > Running SGD Training
>>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
>>>> > > and
>>>> > > HADOOP_CONF_DIR=
>>>> > > MAHOUT-JOB:
>>>> > >
>>>> > >
>>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>>>> > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
>>>> > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
>>>> classpath,
>>>> > > will use command-line arguments only
>>>> > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
>>>> > > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
>>>> > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
>>>> > > --output=[asf-output/classification/sgd/models], --poolSize=[5],
>>>> > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
>>>> > > 24168 training files
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 1
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
>>>> > > 2
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 3
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 4
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 6
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 8
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 10
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00
>>>> > > 0.00 0.00 0.0000000 0.0000000 12
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 15
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 20
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 25
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 30
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000
>>>> > > 0.0000000 40
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 50
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 60
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 70
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 80
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 100
>>>> > > 0.000
>>>> > > 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 120
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 140
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 150
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 200
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 250
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00
>>>> > > 0.00 0.00 0.0000000 0.0000000 300
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 400
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 500
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 600
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 700
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000
>>>> > > 0.0000000 800
>>>> > > 0.000 0.00 none
>>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
>>>> > > 1.0019413e-08 1000 -0.607 75.78 none
>>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
>>>> > > 1.0019413e-08 1200 -0.607 75.78 none
>>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
>>>> > > 1.0019413e-08 1400 -0.607 75.78 none
>>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
>>>> > > 1.0019413e-08 1500 -0.607 75.78 none
>>>> > > 0.24 43686.00 17924.00 329.50
>>>> > > 1.0571799e-08
>>>> > > 1.0032261e-08 2000 -0.487 82.65 none
>>>> > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
>>>> > > 1.0011902e-08 2500 -0.439 83.90 none
>>>> > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
>>>> > > 1.0011902e-08 3000 -0.439 83.90 none
>>>> > > 0.32 50635.00 28531.00 437.09 1.0551175e-08
>>>> > > 1.0000001e-08 4000 -0.351 88.14 none
>>>> > > 0.32 50635.00 32642.00 437.09 1.0551175e-08
>>>> > > 1.0000000e-08 5000 -0.378 87.10 none
>>>> > > 0.32 50635.00 36461.00 437.09
>>>> > > 1.0556652e-08
>>>> > > 1.0000001e-08 6000 -0.372 86.89 none
>>>> > > 0.32 50635.00 37768.00 437.09 1.0576742e-08
>>>> > > 1.0000001e-08 7000 -0.334 89.26 none
>>>> > > 0.32 50635.00 38807.00 437.09 1.0576742e-08
>>>> > > 1.0000000e-08 8000 -0.368 87.52 none
>>>> > > 0.32 50635.00 44731.00 437.09 1.0576716e-08
>>>> > > 1.0000000e-08 10000 -0.374 87.39 none
>>>> > > 0.32 50635.00 45672.00 437.09 1.0576716e-08
>>>> > > 1.0000000e-08 12000 -0.298 88.26 none
>>>> > > Exception in thread "main" java.lang.IllegalStateException:
>>>> > > java.lang.ArrayIndexOutOfBoundsException:
>>>> > > 2
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
>>>> > > at
>>>> > >
>>>> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
>>>> > > at
>>>> > >
>>>> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
>>>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>> Method)
>>>> > > at
>>>> > >
>>>> > >
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>> > >
>>>> > > at
>>>> > >
>>>> > >
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> > > at java.lang.reflect.Method.invoke(Method.java:622)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>>> > > at
>>>> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>> > > at
>>>> > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>>>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>> Method)
>>>> > > at
>>>> > >
>>>> > >
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>> > > at
>>>> > >
>>>> > >
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> > > at java.lang.reflect.Method.invoke(Method.java:622)
>>>> > > at
>>>> > > org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>>>> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
>>>> > > at
>>>> > > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
>>>> > >
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
>>>> > > at
>>>> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>> > > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>> > > at
>>>> > >
>>>> > >
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>>>> > > at
>>>> > >
>>>> > >
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> > > at java.lang.Thread.run(Thread.java:701)
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
>>>> > > andrew.musselman@gmail.com> wrote:
>>>> > >
>>>> > > > Trying out the build today
>>>> > > >
>>>> > > >
>>>> > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
>>>> suneel_marthi@yahoo.com
>>>> > > >wrote:
>>>> > > >
>>>> > > >> This is an issue (trivial one though) that needs to be fixed for
>>>> 0.9
>>>> > > >> Release, will be rerolling the release today (in the next few
>>>> hrs) and
>>>> > > >> putting out a new release candidate in staging.
>>>> > > >>
>>>> > > >> Thanks for reporting this Andrew P.
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
>>>> > > ap.dev@outlook.com>
>>>> > > >> wrote:
>>>> > > >>
>>>> > > >> I ran through the tests with on a CentOS VM
>>>> > > AMD64 2 cores 4 GB RAM. Had
>>>> > > >> a bit of trouble getting the Hadoop natives to compile and
>>>> therefore may
>>>> > > >> have run into some problems because of the hadoop setup. Ran
>>>> into some
>>>> > > >> problems in the example scripts. Particularly with
>>>> > > >> ./cluster-syntheticcontrol.sh ->4,5. I will run through the
>>>> rest of the
>>>> > > >> examples when im sure I've got hadoop setup right.
>>>> > > >>
>>>> > > >>
>>>> > > >> Apache Maven 3.1.2-SNAPSHOT
>>>> > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
>>>> > > >> Java home: /usr/java/jdk1.6.0_45/jre
>>>> > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch:
>>>> "amd64",
>>>> > > >> family: "unix"
>>>> > > >> $MAHOUT_LOCAL=true
>>>> > > >> Hadoop 2.2.0
>>>> > > >>
>>>> > > >>
>>>> > > >> a) Verify that u can unpack the release (tar or zip) ...passed
>>>> (tar)
>>>> > > >> [passed ]
>>>> > > >>
>>>> > > >> b) Verify u r able to compile the
>>>> > > distro
>>>> > > >>
>>>> > > >> mvn compile- [passed with warnings]
>>>> > > >>
>>>> > > >> [WARNING] Expected all dependencies to require Scala
>>>> version: 2.9.3
>>>> > > >> [WARNING] org.apache.mahout:mahout-math-scala:0.9 requires
>>>> scala
>>>> > > >> version: 2.9.3
>>>> > > >> [WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala
>>>> > > >> version: 2.9.2
>>>> > > >> [WARNING] Multiple versions of scala libraries detected!
>>>> > > >>
>>>> > > >> c) Run through the unit tests: mvn clean test
>>>> > > >> mvn clean test [passed]
>>>> > > >>
>>>> > > >> d) Run the
>>>> > > >> example scripts under $MAHOUT_HOME/examples/bin.
>>>> > > >> Please run through all the different options in each script
>>>> > > >>
>>>> > > >> Running example scripts with $MAHOUT_LOCAL=true
>>>> > > >>
>>>> > > >>
>>>> > > ./cluster-syntheticcontrol.sh ->1 [works]
>>>> > > >> ./cluster-syntheticcontrol.sh ->2 [works]
>>>> > > >> ./cluster-syntheticcontrol.sh ->3 [works]
>>>> > > >>
>>>> > > >>
>>>> > > >> ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
>>>> > > >> [...]
>>>> > > >> WARNING: Unable to add class:
>>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>>>> > > >> java.lang.ClassNotFoundException:
>>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>>>> > > >> at
>>>> > > >> java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>> > > >> at java.security.AccessController.doPrivileged(Native
>>>> Method)
>>>> > > >> at
>>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>> > > >> at
>>>> > > java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>>> > > >> at
>>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>>> > > >> at java.lang.Class.forName0(Native Method)
>>>> > > >> at java.lang.Class.forName(Class.java:171)
>>>> > > >> at
>>>> > > >>
>>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>>>> > > >> at
>>>> > > >>
>>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>>>> > > >> Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
>>>> > > >>
>>>> > > >>
>>>> > > >> ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
>>>> > > >>
>>>> > > >> WARNING: Unable to add class:
>>>> > > >>
>>>> > > org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>>>> > > >> java.lang.ClassNotFoundException:
>>>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>>>> > > >> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>> > > >> at java.security.AccessController.doPrivileged(Native
>>>> Method)
>>>> > > >> at
>>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>>> > > >> at
>>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>>> > > >> at java.lang.Class.forName0(Native Method)
>>>> > > >> at
>>>> > > java.lang.Class.forName(Class.java:171)
>>>> > > >> at
>>>> > > >>
>>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>>>> > > >> at
>>>> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>>>> > > >> Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
>>>> > > >> WARNING: No
>>>> > > >>
>>>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found
>>>> > > on
>>>> > > >> classpath, will use command-line arguments only
>>>> > > >> Unknown program
>>>> > > >> 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
>>>> chosen.
>>>> > > >>
>>>> > > >>
>>>> > > >> ./classify-20newsgroups.sh ->1 [works]
>>>> > > >> ./classify-20newsgroups.sh ->2 [works]
>>>> > > >>
>>>> > > >>
>>>> > > >> cluster-reuters.sh ->1 [works]
>>>> > > >>
>>>> > > cluster-reuters.sh ->2 [works]
>>>> > > >> cluster-reuters.sh ->3 [works]
>>>> > > >>
>>>> > > >> Same error as noted previosly in the thread:
>>>> > > >>
>>>> > > >> cluster-reuters.sh ->4 [0 clusters]
>>>> > > >>
>>>> > > >> [...]
>>>> > > >>
>>>> > > >> WARNING: No qualcluster.props found on classpath, will use
>>>> > > >> command-line arguments only
>>>> > > >> Num clusters: 0; maxDistance: 0.000000
>>>> > > >> [Dunn Index]
>>>> > > >> First: Infinity
>>>> > > >> [Davies-Bouldin Index] First: NaN
>>>> > > >> Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
>>>> > > >> INFO: Program took 669 ms (Minutes: 0.01115)
>>>> > > >> cluster,distance.mean,distance.sd
>>>> > > >>
>>>> > >
>>>> > >
>>>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
>>>> > > >> > From: suneel_marthi@yahoo.com
>>>> > > >> > Subject: MAHOUT 0.9 Release - New URL
>>>> > > >> > To: user@mahout.apache.org; dev@mahout.apache.org
>>>> > > >> >
>>>> > > >> > Third time's a Charm!!!
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > Here's the new URL for Mahout 0.9 Release:
>>>> > > >> >
>>>> > > >>
>>>> > >
>>>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>>>> > > >> >
>>>> > > >> > For those volunteering to test this, some of the things to be
>>>> > > verified:
>>>> > > >> >
>>>> > > >> > a) Verify that u can unpack the release (tar or zip)
>>>> > > >> > b) Verify u r able to compile the distro
>>>> > > >> > c) Run through the unit tests: mvn clean test
>>>> > > >> > d) Run the example scripts
>>>> > > >> under $MAHOUT_HOME/examples/bin. Please run through all the
>>>> different
>>>> > > >> options in each script.
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > Committers
>>>> > > >> > and PMC members:
>>>> > > >> > ---------------------------------------
>>>> > > >> >
>>>> > > >> > Need 'at least 3 +1 votes' for the Release to pass.
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > Thanks and
>>>> > > Regards.
>>>> > > >>
>>>> > > >
>>>> > > >
>>>> > >
>>>>
>>>
>>>
>>
>
Re: MAHOUT 0.9 Release - New URL
Posted by Andrew Musselman <an...@gmail.com>.
*factorize-movielens-1M.sh:*
RMSE is:
0.8519064098265133
Sample recommendations:
2229
[2197:4.4961276,527:4.4309845,2972:4.4180074,318:4.379484,572:4.312696,3092:4.249903]
5848
[1900:4.6775646,3787:4.6623707,632:4.641377,2609:4.608225,3808:4.6058936,2998:4.6057487]
3728
[572:4.951382,47:4.748921,874:4.6945343,1563:4.679901,3314:4.6621537,50:4.655838]
1252
[53:5.0,3816:4.9664702,3077:4.9494777,213:4.94007,3808:4.9060082,978:4.8568053]
634
[572:5.0,3092:4.779557,1872:4.72024,2687:4.629712,2125:4.615142,3853:4.5261393]
5516 [572:5.0,2197:5.0,3092:5.0,318:4.908213,356:4.885,3844:4.8237453]
2276 [1204:5.0,572:5.0,912:5.0,1250:5.0,1272:4.999891,1262:4.989652]
4219
[53:4.8112006,598:4.775032,858:4.761604,572:4.7579737,1219:4.680987,1221:4.6604886]
91 [1198:5.0,2762:5.0,1207:5.0,1234:5.0,318:5.0,260:5.0]
502
[953:5.0,260:4.9800477,1234:4.869403,1198:4.8527064,1207:4.8497486,3469:4.847286]
factorize-netflix.sh:
References a no-longer-available data set that Netflix took down after the
competition; should at least mention that the data set is no longer
"online" at least.
On Tue, Jan 21, 2014 at 8:05 PM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:
> *clustering-syntheticcontrol.sh*
>
> *Canopy:*
> [snip]
> 1.0 : [distance-squared=1740.681000315628]: [35.486, 25.600,
> 29.914, 30.200, 27.654, 28.647, 26.582, 32.941, 34.728, 29.047, 34.706,
> 34.764, 28.816, 30.271, 25.784, 25.035, 35.436, 29.075, 24.267, 24.625,
> 28.162, 26.218, 28.773, 28.966, 28.802, 34.292, 26.211, 33.363, 32.920,
> 31.891, 34.504, 32.686, 24.327, 35.981, 31.390, 10.832, 20.238, 10.051,
> 14.877, 10.570, 19.603, 14.544, 10.667, 16.470, 19.007, 10.352, 13.473,
> 12.196, 10.684, 16.620, 20.434, 17.069, 18.744, 9.599, 11.195, 12.002,
> 10.017, 17.149, 14.850, 10.890]
> 1.0 : [distance-squared=1455.363773097357]: [31.022, 28.140,
> 26.730, 26.570, 29.561, 26.966, 28.049, 25.673, 33.721, 26.275, 30.410,
> 31.101, 24.019, 35.659, 25.253, 25.932, 28.618, 32.423, 33.666, 33.745,
> 35.118, 29.164, 25.477, 31.947, 35.491, 30.730, 25.820, 24.651, 25.528,
> 31.343, 29.005, 31.825, 26.891, 28.194, 31.429, 16.935, 8.070, 16.604,
> 14.743, 10.342, 8.155, 10.395, 17.689, 16.791, 14.138, 15.761, 6.787,
> 13.062, 16.660, 15.021, 9.891, 9.216, 11.550, 8.877, 18.220, 9.477, 10.342,
> 16.430, 11.898, 15.366]
> 1.0 : [distance-squared=1679.9304895378882]: [29.625, 25.503,
> 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> 15.285, 22.528, 20.657, 24.129]
> 1.0 : [distance-squared=2044.2887801683828]: [27.414, 25.397,
> 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> 20.229, 11.131, 9.980, 10.720]
> 1.0 : [distance-squared=1385.3154063160764]: [35.899, 26.672,
> 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> 16.546, 15.927, 18.084, 17.475]
> 1.0 : [distance-squared=1920.6376615603585]: [24.538, 24.280,
> 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> 19.310, 12.999, 17.460]
> 1.0 : [distance-squared=2192.939571172661]: [34.335, 30.938,
> 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> 11.743, 11.699, 10.152]
> Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Wrote 6 clusters
> Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 5827 ms (Minutes: 0.09711666666666667)
>
> *K-means:*
> [snip]
> 1.0 : [distance-squared=2873.881301031739]: [26.369, 37.791,
> 41.839, 39.694, 36.728, 35.079, 30.668, 24.755, 20.610, 18.885, 15.459,
> 14.074, 19.117, 34.230, 32.202, 40.715, 39.543, 37.193, 39.448, 30.829,
> 25.273, 22.324, 19.801, 13.505, 17.462, 24.287, 27.398, 38.577, 42.108,
> 41.947, 43.987, 41.331, 28.958, 23.664, 20.308, 22.244, 12.149, 15.768,
> 18.964, 32.579, 33.202, 36.205, 42.364, 40.601, 35.415, 39.576, 33.145,
> 19.936, 23.062, 19.053, 24.383, 19.611, 25.218, 38.860, 36.570, 38.964,
> 36.946, 36.900, 32.593, 31.563]
> 1.0 : [distance-squared=2525.8924241648783]: [35.389, 31.178,
> 40.041, 43.034, 49.524, 40.942, 42.369, 30.153, 26.253, 23.178, 19.885,
> 19.693, 21.837, 26.858, 23.533, 29.798, 43.401, 43.349, 48.238, 43.868,
> 44.017, 28.056, 25.284, 22.737, 13.703, 14.023, 22.115, 23.720, 25.478,
> 37.870, 37.868, 46.954, 43.135, 41.286, 37.269, 29.347, 24.312, 21.743,
> 19.152, 21.668, 10.882, 16.147, 30.020, 28.472, 38.716, 44.620, 47.847,
> 48.846, 39.361, 38.449, 33.105, 19.935, 14.961, 12.149, 12.630, 13.459,
> 18.967, 35.473, 30.146, 45.527]
> 1.0 : [distance-squared=2392.7171990886272]: [27.662, 37.199,
> 39.158, 44.264, 46.473, 40.178, 38.728, 24.412, 25.155, 15.938, 13.125,
> 16.867, 13.875, 29.969, 34.322, 40.870, 44.225, 47.594, 47.607, 44.751,
> 39.813, 32.461, 16.646, 16.305, 21.256, 20.627, 19.267, 31.901, 34.995,
> 43.643, 43.152, 47.125, 48.524, 33.131, 32.697, 20.300, 14.350, 18.330,
> 12.155, 11.261, 19.701, 29.313, 30.457, 39.914, 49.543, 49.851, 46.445,
> 39.561, 28.860, 22.486, 13.967, 10.006, 11.715, 15.945, 21.348, 25.439,
> 41.024, 37.105, 45.623, 43.589]
> 1.0 : [distance-squared=1419.8378244373016]: [25.784, 34.129,
> 42.659, 37.176, 35.961, 34.307, 32.108, 29.749, 25.047, 17.455, 24.652,
> 25.311, 22.995, 30.256, 25.955, 28.426, 34.556, 40.386, 39.642, 40.566,
> 32.612, 34.091, 26.033, 25.668, 26.545, 17.338, 24.980, 20.134, 27.542,
> 36.612, 31.855, 37.947, 39.736, 33.535, 36.607, 37.479, 32.612, 22.262,
> 20.662, 16.124, 24.547, 27.686, 21.747, 27.198, 31.259, 40.569, 37.067,
> 34.465, 34.730, 33.371, 23.060, 30.162, 22.022, 22.216, 14.812, 19.357,
> 24.508, 34.432, 32.155, 34.839]
> 1.0 : [distance-squared=4186.814512311335]: [25.870, 39.195,
> 36.908, 47.052, 47.384, 40.741, 42.494, 30.282, 25.834, 17.650, 16.004,
> 17.895, 13.321, 19.045, 27.440, 31.911, 39.208, 43.622, 41.567, 44.815,
> 44.921, 35.422, 35.477, 23.190, 17.859, 14.684, 23.504, 23.141, 21.746,
> 30.816, 31.361, 37.015, 38.094, 46.688, 47.681, 43.777, 39.652, 31.701,
> 23.767, 22.265, 22.654, 22.327, 19.195, 21.163, 29.602, 27.563, 36.244,
> 38.859, 44.234, 42.352, 42.160, 40.172, 30.094, 21.092, 25.193, 13.096,
> 18.111, 14.754, 27.386, 27.026]
> 1.0 : [distance-squared=1544.4011543572997]: [28.075, 41.784,
> 42.120, 38.735, 44.320, 34.316, 32.212, 31.868, 24.301, 14.547, 17.178,
> 22.279, 24.357, 31.011, 31.444, 34.837, 46.550, 48.301, 38.859, 42.363,
> 35.657, 31.499, 19.794, 12.124, 15.371, 12.436, 15.763, 24.679, 32.597,
> 43.004, 36.616, 38.935, 42.954, 34.957, 36.183, 28.177, 16.326, 14.988,
> 10.680, 22.728, 24.075, 24.058, 36.616, 43.982, 39.198, 40.118, 40.078,
> 34.752, 34.018, 23.750, 18.374, 12.251, 15.539, 18.699, 28.973, 24.044,
> 39.404, 38.034, 46.458, 44.432]
> 1.0 : [distance-squared=825.9338725427806]: [33.670, 38.675,
> 39.742, 41.989, 37.291, 43.975, 31.909, 25.878, 31.080, 15.858, 13.950,
> 23.097, 19.983, 21.692, 31.579, 38.570, 33.376, 38.843, 41.936, 33.534,
> 39.195, 32.897, 25.343, 18.523, 15.089, 17.771, 22.614, 25.313, 23.687,
> 29.010, 41.995, 35.712, 40.872, 41.669, 32.156, 25.162, 24.980, 23.705,
> 18.413, 20.975, 14.906, 26.171, 30.165, 27.818, 35.083, 39.514, 37.851,
> 33.967, 32.338, 34.977, 26.589, 28.079, 19.597, 24.669, 23.098, 25.685,
> 28.215, 34.940, 36.910, 39.749]
> Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Wrote 6 clusters
> Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 16902 ms (Minutes: 0.2817)
>
> *Fuzzy k-means:*
> [snip]
> 1.0 : [distance-squared=971.7369782121968]: [29.625, 25.503,
> 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
> 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
> 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
> 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
> 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
> 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
> 15.285, 22.528, 20.657, 24.129]
> 1.0 : [distance-squared=2054.618163154475]: [27.414, 25.397,
> 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
> 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
> 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
> 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
> 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
> 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
> 20.229, 11.131, 9.980, 10.720]
> 1.0 : [distance-squared=954.6503560728597]: [35.899, 26.672,
> 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
> 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
> 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
> 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
> 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
> 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
> 16.546, 15.927, 18.084, 17.475]
> 1.0 : [distance-squared=2817.9170498632957]: [24.538, 24.280,
> 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
> 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
> 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
> 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
> 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
> 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
> 19.310, 12.999, 17.460]
> 1.0 : [distance-squared=3472.3684696871424]: [34.335, 30.938,
> 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
> 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
> 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
> 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
> 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
> 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
> 11.743, 11.699, 10.152]
> Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Wrote 6 clusters
> Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 106615 ms (Minutes: 1.7769166666666667)
>
> *Dirichlet and Meanshift:*
> Already detailed in M-1400, deprecated jobs still referenced.
>
>
>
> On Tue, Jan 21, 2014 at 6:20 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
>> *cluster-reuters.sh*
>> *k-means:*
>>
>> [snip]
>> :VL-19482{n=913 c=[0.06:0.011, 0.1:0.007, 0.13:0.010, 0.25:0.016,
>> 0.38:0.020, 0.4:0.007, 0.5:0.032, 0
>> Top Terms:
>> banks =>
>> 3.841823268955143
>> bank =>
>> 3.80633066361209
>> debt =>
>> 3.28065219870794
>> said =>
>> 2.5965700942088583
>> he =>
>> 2.335682813857497
>> foreign =>
>> 2.2217853688201403
>> billion =>
>> 2.1970193848291335
>> would =>
>> 1.9932392063955617
>> loans =>
>> 1.9309276792854233
>> interest =>
>> 1.787324501938
>> have =>
>> 1.762981951432578
>> its =>
>> 1.7615109954971866
>> which =>
>> 1.5822081148036862
>> has =>
>> 1.5600708189041956
>> dlrs =>
>> 1.5571038313005996
>> finance =>
>> 1.5539758811252924
>> new =>
>> 1.5176015811577555
>> had =>
>> 1.5138723701401844
>> brazil =>
>> 1.5083369853593172
>> payments =>
>> 1.4539044255886517
>> Weight : [props - optional]: Point:
>>
>> :VL-7320{n=2726 c=[0:0.003, 0.1:0.010, 0.2:0.007, 0.3:0.009, 0.4:0.007,
>> 0.40:0.003, 0.5:0.009, 0.57:0
>> Top Terms:
>> vs =>
>> 6.126130791333171
>> net =>
>> 4.012191567277523
>> cts =>
>> 3.822006848832744
>> shr =>
>> 3.6786004856764527
>> mln =>
>> 2.9011643584038698
>> loss =>
>> 2.788368861463607
>> qtr =>
>> 2.714140225051522
>> revs =>
>> 2.4739861236454717
>> profit =>
>> 1.8146888090247015
>> note =>
>> 1.7977163272138388
>> dlrs =>
>> 1.6164390808155846
>> avg =>
>> 1.3901765773336587
>> shrs =>
>> 1.3856326531419314
>> mths =>
>> 1.3168717272038506
>> 4th =>
>> 1.2161158425617289
>> oper =>
>> 1.182419473776814
>> year =>
>> 1.178086061733047
>> nine =>
>> 1.0670554836445316
>> 3rd =>
>> 1.041334410056592
>> inc =>
>> 1.0019361981554935
>> Weight : [props - optional]: Point:
>>
>>
>> Inter-Cluster Density: 0.45562152681859414
>> Intra-Cluster Density: 0.6952712632167628
>> CDbw Inter-Cluster Density: 0.0
>> CDbw Intra-Cluster Density: 16.486930227598684
>> CDbw Separation: 194.49005884464628
>>
>> *fuzzy k-means:*
>> :SV-18539{n=1039 c=[0:0.026, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
>> 0.01:0.005, 0.02:0.002, 0.0
>> Top Terms:
>> said =>
>> 1.8665592354713065
>> its =>
>> 1.1335212213411592
>> pct =>
>> 1.0862816801353348
>> dlrs =>
>> 1.0854998884993752
>> mln =>
>> 1.043163996400643
>> from =>
>> 0.9684961110525736
>> has =>
>> 0.912161511978058
>> company =>
>> 0.8754186972808333
>> mar =>
>> 0.8675333452422878
>> inc =>
>> 0.7678617590362815
>> would =>
>> 0.7610968883652675
>> he =>
>> 0.7459988770503974
>> which =>
>> 0.7435613119406804
>> year =>
>> 0.7302840632748394
>> u.s =>
>> 0.7281061062439116
>> shares =>
>> 0.7260764102983083
>> corp =>
>> 0.7179807367808658
>> new =>
>> 0.7044203783157115
>> stock =>
>> 0.6962010978721442
>> have =>
>> 0.6464265467298506
>> :SV-9431{n=1034 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
>> 0.01:0.004, 0.02:0.002, 0.02
>> Top Terms:
>> said =>
>> 1.864911184196927
>> dlrs =>
>> 1.199286689822081
>> mln =>
>> 1.1802134783562215
>> pct =>
>> 1.1529704214798124
>> its =>
>> 1.1184398851519701
>> from =>
>> 1.016647848050332
>> company =>
>> 0.894703604722841
>> mar =>
>> 0.879986159541356
>> has =>
>> 0.8642799128491316
>> year =>
>> 0.8271823503717782
>> inc =>
>> 0.7871293745341424
>> corp =>
>> 0.737705498468879
>> which =>
>> 0.722975201852743
>> would =>
>> 0.708000816484415
>> u.s =>
>> 0.7073294276173905
>> billion =>
>> 0.7055723996916351
>> he =>
>> 0.7042684217823294
>> new =>
>> 0.6834737905434939
>> shares =>
>> 0.6753327384172428
>> stock =>
>> 0.6576225144041699
>> :SV-4785{n=1044 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
>> 0.01:0.006, 0.02:0.002, 0.02
>> Top Terms:
>> said =>
>> 1.8796076179735086
>> its =>
>> 1.172025965452378
>> dlrs =>
>> 1.130422792460914
>> pct =>
>> 1.082038255241358
>> mln =>
>> 1.0772146872767114
>> company =>
>> 0.9662235879639138
>> from =>
>> 0.9473172871605616
>> has =>
>> 0.9224712965830099
>> mar =>
>> 0.8769325856924421
>> inc =>
>> 0.8360245257169788
>> shares =>
>> 0.8334595641384324
>> stock =>
>> 0.7704621839612175
>> corp =>
>> 0.7682400250301806
>> which =>
>> 0.7389988207856137
>> would =>
>> 0.7339708917389389
>> year =>
>> 0.7088414843731325
>> new =>
>> 0.7038109468655172
>> he =>
>> 0.6993994455501005
>> u.s =>
>> 0.6772649147622415
>> share =>
>> 0.6241804830055171
>>
>> *lda:*
>>
>> [snip]
>> 21539
>> {0.02:0.7071698119320297,0.01:0.09185661419250732,0.055:0.05559716236030052,0:0.04416624482186779,0.046:0.04110903741109505,0.10:0.036664417789754995,0.073:0.009543544335363714,0.006913:0.0050293341354450535,0.007050:0.004031353935388081,0.003:0.0019163134919350053}
>> 21540
>> {0.04:0.4289840457590362,0.006913:0.3764552842292379,0.03:0.14201700033942147,0.025:0.031492533562460345,0.01:0.0057428959027208,0.0625:0.005563615996288134,0.007050:0.004410121345864598,0.02:0.003315679992182833,0.06:0.0010680133665222197,0.057:2.6908116668663575E-4}
>> 21541
>> {0:0.6323790020346274,0.02:0.282038438551802,0.06:0.05929917592183956,0.046:0.021922159161082488,0.025:0.003123559949176154,0.1:6.163861804777446E-4,0.057:1.1513607281652563E-4,0.077:7.546641269535658E-5,0.05:6.020069105221075E-5,0.04:5.727546417602751E-5}
>> 21542
>> {0.06:0.7916202902993545,0.003:0.12474538653424426,0.03:0.0516187501990535,0.073:0.013078206873952192,0.077:0.006874558859394474,0.1:0.004726257215175674,0.07:0.0029954110394770084,0.02:0.0015535221634909381,0.0625:8.057802584933225E-4,0.006913:5.349608183182145E-4}
>> 21543
>> {0.03:0.8557703918728101,0.007050:0.0884696514495358,0.01:0.02386217742025169,0.055:0.01794750983209314,0.046:0.007244240662980594,0.02:0.0025021610305199344,0.1:0.0011795344808501283,0.0625:9.555718731181631E-4,0.077:8.384080940465597E-4,0.003:2.4480111447900804E-4}
>> 21544
>> {0.006913:0.6497351267772595,0.03:0.2586364130498913,0.003:0.0845308091084703,0.05:0.003488921831506818,0.0625:0.0011991059579690732,0.06:5.301659535652308E-4,0.055:3.152013670552699E-4,0.025:3.11949418681945E-4,0.02:2.2346509541410544E-4,0.057:1.727741085612565E-4}
>> 21545
>> {0.07:0.24937001680778675,0.03:0.22854680579732564,0.1:0.22068070811382484,0.05:0.21891358916881656,0.06:0.04971205462818302,0.046:0.029384231252419384,0.025:0.0014118858333346275,0.0625:0.0012321257282372393,0.073:3.1606402052550454E-4,0.04:1.1662985389192366E-4}
>> 21546
>> {0.006913:0.652435612545713,0.073:0.24282539253890825,0.003:0.055020663101050894,0.046:0.01978650831133704,0.04:0.011699600233998459,0.02:0.006822518565048025,0.05:0.0042733514779345234,0.025:0.0020768247329981497,0.03:0.0015466188524926666,0:0.0010433788491090747}
>> 21547
>> {0.1:0.935072008706917,0.077:0.040616382672055494,0.046:0.023138054104883225,0.06:3.42252302356326E-4,0.03:2.672247473681925E-4,0.007050:9.004419280457053E-5,0.006913:7.844271097106069E-5,0.04:7.198835322717653E-5,0.073:5.8017696474244885E-5,0.02:4.474316852571109E-5}
>> 21548
>> {0.006913:0.4510141547325999,0.03:0.21472666934984877,0.07:0.15060570743672352,0.046:0.08216181853028293,0.05:0.07498297963542139,0.077:0.01401215532342401,0.04:0.006135722806477439,0.073:0.0031124469556872442,0.02:0.001327252389650958,0.025:4.5167404646311124E-4}
>> 21549
>> {0.077:0.5249260290096315,0:0.27520186965742544,0.073:0.08959570186504386,0.057:0.05063809804337512,0.02:0.045313417578343,0.03:0.00856024297303885,0.06:0.0034039136814002697,0.07:8.611323331226122E-4,0.05:3.9387255234958607E-4,0.04:3.454752498776842E-4}
>> 21550
>> {0.077:0.8319708524327014,0.007050:0.16001430652201168,0.0625:0.0024884528530674226,0.03:0.0024228964094551416,0.006913:0.0010048446080994332,0.025:5.974607168723063E-4,0.06:2.7741593377792194E-4,0.057:2.2975316663437597E-4,0.073:2.1769059479546598E-4,0.01:1.4945234676270913E-4}
>> 21551
>> {0:0.43298549949426596,0.07:0.30407459462158964,0.0625:0.15744077685124136,0.01:0.06385977925647884,0.025:0.020205557109867888,0.04:0.013226123091680062,0.055:0.006501562964287696,0.073:5.871897881404074E-4,0.003:4.554836644848264E-4,0.05:2.0489216962221778E-4}
>> 21552
>> {0.06:0.7383744333942458,0.02:0.07321126682102753,0.07:0.05910378841288311,0.10:0.056907223730939045,0:0.02739559786902668,0.055:0.02228913751272657,0.1:0.00943274247398869,0.073:0.007301445750018608,0.03:0.0027711985062277246,0.046:0.0022569760697531112}
>> 21553
>> {0.006913:0.9383779606743132,0.07:0.018356074893823605,0.1:0.017572502072403694,0.025:0.011031848455352145,0.02:0.007110054905474798,0.10:0.0032390933761145377,0.01:0.0022676909091530165,0.06:6.35712654913703E-4,0.003:3.037608224814005E-4,0.0625:2.569989005512836E-4}
>> 21554
>> {0.02:0.4740260172915081,0.03:0.3264871353578654,0.003:0.12383071192450323,0:0.0387913487693229,0.046:0.013922363892934853,0.01:0.01115832101415319,0.055:0.004438529958216764,0.006913:0.0031112721201723413,0.025:0.002399946628810436,0.073:6.131643125523258E-4}
>> 21555
>> {0.01:0.42396888624053325,0.057:0.2876207419392007,0.055:0.13436747721404915,0.10:0.0923627978966975,0.05:0.030530372962547347,0.003:0.008404234144369286,0.006913:0.007605013189108045,0.007050:0.005108825532219537,0.0625:0.004651573366090424,0.025:0.002180181204608079}
>> 21556
>> {0.02:0.9634685201058687,0.077:0.02816791882994464,0.003:0.007368001305747207,0.01:4.6299664334060455E-4,0.055:6.358811991095171E-5,0.06:5.5240076581247115E-5,0.03:5.231350505975146E-5,0.073:4.4773553407989E-5,0:3.6810457882911134E-5,0.006913:3.469265861451538E-5}
>> 21557
>> {0.06:0.9993947051352264,0.05:2.53296407549323E-4,0.03:9.647254237540585E-5,0.04:4.3439270433017595E-5,0.006913:3.354681491698619E-5,0.046:2.8586727566416525E-5,0.007050:2.5584606074089293E-5,0.02:2.2477243875316502E-5,0:1.9194783598477564E-5,0.073:1.6900267481892075E-5}
>> 21558
>> {0.06:0.9985073788555696,0.03:8.033091100049725E-4,0.02:1.1953766739610202E-4,0.1:9.257610652171745E-5,0.006913:6.903783269939522E-5,0.04:5.7960967280609926E-5,0.07:5.5767050956214925E-5,0.046:4.857027337508731E-5,0.007050:3.5964741541364354E-5,0.057:3.538185773175377E-5}
>> 21559
>> {0.006913:0.5411158302162348,0.073:0.10595054605908563,0.04:0.10013413310674449,0.007050:0.08891690362990352,0.003:0.08028744789933502,0.03:0.07779866329563544,0.025:0.0024314950846438975,0.0625:0.0018111845757907532,0.077:5.019763638644379E-4,0.055:2.6705221435486376E-4}
>> 21560
>> {0.06:0.9978884964462115,0.03:9.139179655096763E-4,0.02:6.066525825847198E-4,0.04:3.2209695809936266E-4,0.006913:7.970816960964983E-5,0.007050:5.58984012266038E-5,0.046:1.545206468939752E-5,0.077:1.5319891034946036E-5,0:1.5223193391279898E-5,0.073:1.1687759018690983E-5}
>> 21561
>> {0.06:0.8704787441462888,0.007050:0.09131090501970876,0.0625:0.03778566273654969,0.046:8.111987924033587E-5,0.01:6.397000919080148E-5,0.1:4.979138875277178E-5,0.073:2.7942853854174412E-5,0.03:2.762357950066146E-5,0.077:2.6820742114533983E-5,0.025:2.6208487691114472E-5}
>> 21562
>> {0.06:0.9625603310705717,0.02:0.01708933716171641,0.007050:0.011374975845817934,0.07:0.00482114999912725,0.057:0.003310531318631415,0.077:2.568457462720504E-4,0.025:1.3439876933758153E-4,0.1:1.0512433283405881E-4,0:8.737443941062146E-5,0.046:5.9284232059258864E-5}
>> 21563
>> {0.06:0.9996809273982157,0.046:1.6325482120709976E-4,0.02:2.1084598024339765E-5,0.006913:1.7901242019979392E-5,0.04:1.3991162886383784E-5,0.03:1.3682157013015017E-5,0.077:1.3602492456590212E-5,0.007050:1.206712606141859E-5,0.1:1.0231842284246997E-5,0.057:7.704725064510759E-6}
>> 21564
>> {0.06:0.9997861963976675,0.03:2.7140899197995276E-5,0.077:2.4456158311423386E-5,0.04:1.7952818233846462E-5,0.02:1.6334455815684533E-5,0.006913:1.4910843270220926E-5,0.073:1.4893553937733922E-5,0.1:1.2545291899719683E-5,0.007050:1.2337617858874285E-5,0.07:1.121302251254011E-5}
>> 21565
>> {0.006913:0.5047832315246878,0.007050:0.2502505818382197,0.04:0.09937533960784072,0.03:0.05332716291468396,0.0625:0.035738976624857435,0.05:0.023139962103851885,0.1:0.01510786357969295,0.025:0.01294816540331917,0.06:0.003589347327961106,0.073:4.8607269023994543E-4}
>> 21566
>> {0.03:0.865579490292393,0.073:0.065013560785593,0.077:0.056622094108767465,0.046:0.006811842330071251,0.057:0.0021561477140846267,0.01:0.0013149375957061502,0.04:7.456782721333958E-4,0.05:6.412995789267404E-4,0:2.107021879325011E-4,0.007050:1.8121393989190674E-4}
>> 21567
>> {0.077:0.6528663315309344,0.03:0.2794089480653573,0.025:0.060214953606503134,0.003:0.0029851203917978303,0.01:0.002840784719750811,0.007050:5.508511345707982E-4,0.02:3.3395642786457786E-4,0.07:1.7452781529689483E-4,0.055:1.541344869853217E-4,0.046:1.0154945247629696E-4}
>> 21568
>> {0.057:0.7925855379581803,0.03:0.2036907633660934,0.06:0.002283924010657722,0.046:0.0011125177332923534,0.10:6.706442563331911E-5,0.02:3.9070227131596934E-5,0.07:3.266094677087569E-5,0.1:2.3436639268605713E-5,0.077:2.318067689954084E-5,0.006913:2.273173553155518E-5}
>> 21569
>> {0.06:0.7223525910216753,0.1:0.22280373045161775,0.04:0.04239924319412595,0.02:0.006529556700876843,0.007050:0.004661124794787862,0.10:2.017974034648702E-4,0.05:2.017516118028694E-4,0.025:1.583677755896652E-4,0.006913:1.1939663934259253E-4,0:8.736457986006156E-5}
>> 21570
>> {0.073:0.5321098304788365,0.006913:0.3897385574581158,0.02:0.051125703897749404,0.077:0.012417023254098358,0.01:0.006491518762040415,0.03:0.005434610908750246,0.055:9.352347087701305E-4,0.06:5.239363525283659E-4,0.10:4.4114444135088393E-4,0.003:2.2360302221231853E-4}
>> 21571
>> {0.06:0.9074255414695478,0.05:0.08971808812931319,0.02:0.0019124104766371694,0.1:2.581234320438502E-4,0.073:1.9041498001195312E-4,0.046:8.348355382104383E-5,0.006913:5.748062878632201E-5,0.057:4.8979196235823963E-5,0.04:4.4005272512340306E-5,0.03:4.1089625609562384E-5}
>> 21572
>> {0.06:0.7216888928389846,0.04:0.1955684645266304,0.006913:0.08235454447065854,0.1:9.82899034505965E-5,0.046:4.9184577303445956E-5,0.05:4.021965070167039E-5,0.007050:3.081280652634891E-5,0.073:2.791145099471127E-5,0.02:1.95681856192452E-5,0:1.9037190007395713E-5}
>> 21573
>> {0.05:0.8568971411565196,0.046:0.12909436895238377,0.06:0.012704234652048044,0.04:2.472784666357729E-4,0.055:2.1886698996361582E-4,0.1:1.1974451444757112E-4,0.0625:9.082357988309755E-5,0.07:9.030017229129562E-5,0.03:7.268045692763623E-5,0.073:6.607926928741721E-5}
>> 21574
>> {0.046:0.5619466458628039,0.006913:0.184782367089353,0.0625:0.09726566772972363,0.003:0.09534816862353344,0.02:0.015692927163565275,0.073:0.015492017672231727,0.01:0.01411218625979968,0.007050:0.01037341031640615,0.055:0.0020124341216292752,0:0.0013828685922332715}
>> 21575
>> {0.05:0.3167681189235041,0.06:0.2879333280436204,0.046:0.22584628506521745,0.003:0.15994241628395953,0.0625:0.006482994028630967,0.1:0.0016773104050919493,0.055:4.8725015996676173E-4,0.03:2.432893903382962E-4,0.025:1.6015914759364425E-4,0.006913:1.0021292427951807E-4}
>> 21576
>> {0.077:0.2407816981967022,0.003:0.18594668103110193,0.10:0.14951775492012523,0.0625:0.09241152906714677,0.007050:0.08997645163280943,0.057:0.07102865286733068,0.055:0.048046579920457584,0.05:0.03776387140040494,0.073:0.033355081179026046,0.006913:0.015980584385115525}
>> 21577
>> {0.06:0.9599074612361259,0.1:0.03694135499501186,0.04:0.0021941315048273186,0.057:4.7070027358666304E-4,0.02:1.8231606308803002E-4,0.0625:6.234942557920162E-5,0.006913:5.159011604129561E-5,0.046:3.9832812943124216E-5,0.007050:3.0380683715134534E-5,0.05:2.8920851352755496E-5}
>>
>>
>> *Streaming k-means:*
>>
>> [snip]
>> INFO: Number of Centroids: 0
>> Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
>> WARNING: job_local23982482_0001
>> java.lang.IllegalArgumentException: Must have nonzero number of training
>> and test vectors. Asked for %.1f %% of %d vectors for test
>> [10.000000149011612, 0]
>> at
>> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
>> at
>> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
>> at
>> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
>> at
>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
>> at
>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
>> at
>> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
>> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
>> at
>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
>>
>> [snip]
>>
>> WARNING: No qualcluster.props found on classpath, will use command-line
>> arguments only
>> Num clusters: 0; maxDistance: 0.000000
>> [Dunn Index] First: Infinity
>> [Davies-Bouldin Index] First: NaN
>> Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Program took 535 ms (Minutes: 0.008916666666666666)
>> cluster,distance.mean,distance.sd
>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>>
>>
>> On Tue, Jan 21, 2014 at 1:47 PM, Andrew Musselman <
>> andrew.musselman@gmail.com> wrote:
>>
>>> *classify-20newsgroups.sh*
>>>
>>> *Complementary naive bayes:*
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances : 11207 98.9406%
>>> Incorrectly Classified Instances : 120 1.0594%
>>> Total Classified Instances : 11327
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a b c d e f g h i
>>> j k l m n o p q r s
>>> t <--Classified as
>>> 475 0 0 1 0 0 0 0 0
>>> 0 0 0 0 0 1 0 1 0 0
>>> 0 | 478 a = alt.atheism
>>> 0 597 1 1 0 1 1 0 0
>>> 0 0 1 0 2 1 0 0 0 0
>>> 0 | 605 b = comp.graphics
>>> 0 1 620 3 0 1 0 0 0
>>> 0 0 1 0 0 1 0 0 0 0
>>> 0 | 627 c = comp.os.ms-windows.misc
>>> 1 1 1 593 2 0 0 0 0
>>> 0 0 0 0 0 0 1 0 0 0
>>> 0 | 599 d = comp.sys.ibm.pc.hardware
>>> 0 1 1 0 568 0 1 0 0
>>> 0 1 1 2 0 0 0 0 1 0
>>> 0 | 576 e = comp.sys.mac.hardware
>>> 0 4 2 0 0 581 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0
>>> 0 | 587 f = comp.windows.x
>>> 0 0 0 1 2 0 571 3 0
>>> 0 1 1 4 1 0 0 0 0 0
>>> 0 | 584 g = misc.forsale
>>> 0 0 0 1 0 0 0 589 1
>>> 0 0 1 1 0 0 0 0 0 0
>>> 0 | 593 h = rec.autos
>>> 0 0 0 0 0 0 0 1 565
>>> 0 0 0 0 0 1 0 0 0 0
>>> 0 | 567 i = rec.motorcycles
>>> 0 0 0 0 0 0 0 0 0
>>> 600 2 0 0 0 1 0 0 0 0
>>> 0 | 603 j = rec.sport.baseball
>>> 0 0 0 0 0 0 0 0 0
>>> 1 584 0 0 0 0 0 0 0 0
>>> 0 | 585 k = rec.sport.hockey
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 579 0 0 0 0 0 1 0
>>> 0 | 580 l = sci.crypt
>>> 0 0 0 1 3 0 2 0 0
>>> 2 0 0 567 1 2 1 0 0 0
>>> 0 | 579 m = sci.electronics
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 0 1 605 0 0 0 0 0
>>> 0 | 606 n = sci.med
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 602 0 0 0 0
>>> 0 | 602 o = sci.space
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 1 0 602 0 0 1
>>> 0 | 604 p = soc.religion.christian
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 556 0 0
>>> 0 | 556 q = talk.politics.mideast
>>> 0 0 1 0 0 0 0 0 0
>>> 0 0 1 0 0 1 0 0 568 0
>>> 0 | 571 r = talk.politics.guns
>>> 11 0 0 0 0 0 0 0 0
>>> 1 0 0 0 1 3 8 1 4 338
>>> 2 | 369 s = talk.religion.misc
>>> 0 0 0 0 0 0 0 0 0
>>> 0 1 0 0 0 1 0 3 4 0
>>> 447 | 456 t = talk.politics.misc
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa 0.9806
>>> Accuracy 98.9406%
>>> Reliability 94.0932%
>>> Reliability (standard deviation) 0.2163
>>>
>>> Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 15870 ms (Minutes: 0.2645)
>>> + echo 'Testing on holdout set'
>>> Testing on holdout set
>>> + ./bin/mahout testnb -i /tmp/mahout-work-ec2-user/20news-test-vectors
>>> -m /tmp/mahout-work-ec2-user/model -l /tmp/mahout-work-ec2-user/labelindex
>>> -ow -o /tmp/mahout-work-ec2-user/20news-testing -c
>>>
>>> [snip]
>>>
>>> INFO: Complementary Results:
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances : 6715 89.3071%
>>> Incorrectly Classified Instances : 804 10.6929%
>>> Total Classified Instances : 7519
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a b c d e f g h i
>>> j k l m n o p q r s
>>> t <--Classified as
>>> 298 0 0 0 0 0 0 0 0
>>> 1 0 0 0 1 2 5 1 0 13
>>> 0 | 321 a = alt.atheism
>>> 0 298 11 6 1 12 2 2 1
>>> 1 3 8 3 4 2 4 1 4 4
>>> 1 | 368 b = comp.graphics
>>> 1 17 286 16 4 9 6 3 2
>>> 0 1 0 1 7 1 0 2 1 0
>>> 1 | 358 c = comp.os.ms-windows.misc
>>> 2 6 11 309 9 5 14 8 1
>>> 0 2 0 6 4 2 0 1 2 1
>>> 0 | 383 d = comp.sys.ibm.pc.hardware
>>> 0 10 8 7 334 7 5 5 2
>>> 0 3 0 2 1 1 0 1 1 0
>>> 0 | 387 e = comp.sys.mac.hardware
>>> 1 13 7 8 2 355 2 0 2
>>> 0 0 5 1 1 3 0 0 1 0
>>> 0 | 401 f = comp.windows.x
>>> 0 7 11 29 12 9 268 16 8
>>> 4 3 2 6 4 2 1 3 1 2
>>> 3 | 391 g = misc.forsale
>>> 0 1 0 0 3 0 7 362 8
>>> 2 2 1 2 0 2 0 1 2 0
>>> 4 | 397 h = rec.autos
>>> 0 0 0 1 0 0 1 0 423
>>> 0 0 0 2 1 0 1 0 0 0
>>> 0 | 429 i = rec.motorcycles
>>> 0 0 1 0 0 0 0 2 2
>>> 371 8 0 2 3 0 2 0 0 0
>>> 0 | 391 j = rec.sport.baseball
>>> 0 0 1 0 0 0 1 0 0
>>> 2 409 0 0 0 0 0 0 0 0
>>> 1 | 414 k = rec.sport.hockey
>>> 0 0 1 2 1 0 1 0 0
>>> 0 0 404 0 0 0 0 0 1 0
>>> 1 | 411 l = sci.crypt
>>> 0 5 4 11 1 3 7 9 2
>>> 5 3 3 339 2 6 0 1 1 2
>>> 1 | 405 m = sci.electronics
>>> 0 4 0 1 0 0 0 1 0
>>> 1 1 0 3 367 3 1 2 0 0
>>> 0 | 384 n = sci.med
>>> 0 1 2 0 1 0 2 0 0
>>> 1 0 0 1 1 375 0 1 0 0
>>> 0 | 385 o = sci.space
>>> 4 2 1 1 0 0 1 1 2
>>> 0 0 1 1 5 1 367 4 0 1
>>> 1 | 393 p = soc.religion.christian
>>> 0 1 0 0 0 0 0 0 0
>>> 2 0 0 0 0 0 2 378 0 1
>>> 0 | 384 q = talk.politics.mideast
>>> 0 0 0 0 0 2 1 1 1
>>> 1 0 3 0 3 0 0 2 319 2
>>> 4 | 339 r = talk.politics.guns
>>> 32 0 0 1 0 0 0 0 0
>>> 1 1 1 0 2 2 26 5 7 175
>>> 6 | 259 s = talk.religion.misc
>>> 0 0 0 2 0 0 0 0 0
>>> 1 2 2 0 1 2 1 10 18 2
>>> 278 | 319 t = talk.politics.misc
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa 0.8594
>>> Accuracy 89.3071%
>>> Reliability 84.611%
>>> Reliability (standard deviation) 0.2148
>>>
>>> Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 10840 ms (Minutes: 0.18066666666666667)
>>>
>>>
>>> *Naive bayes:*
>>> INFO: Standard NB Results:
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances : 11286 99.0869%
>>> Incorrectly Classified Instances : 104 0.9131%
>>> Total Classified Instances : 11390
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a b c d e f g h i
>>> j k l m n o p q r s
>>> t <--Classified as
>>> 474 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 2
>>> 1 | 477 a = alt.atheism
>>> 0 566 0 2 0 1 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0
>>> 0 | 569 b = comp.graphics
>>> 0 10 590 29 2 4 1 0 0
>>> 0 0 0 1 0 0 0 0 0 0
>>> 1 | 638 c = comp.os.ms-windows.misc
>>> 0 0 0 596 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0
>>> 0 | 596 d = comp.sys.ibm.pc.hardware
>>> 0 0 0 0 575 0 1 0 0
>>> 0 0 0 1 0 0 0 0 0 0
>>> 0 | 577 e = comp.sys.mac.hardware
>>> 0 2 2 2 0 593 1 0 0
>>> 0 0 0 0 0 1 0 0 0 0
>>> 0 | 601 f = comp.windows.x
>>> 0 0 0 1 0 0 589 1 0
>>> 0 1 0 2 0 0 0 0 0 0
>>> 0 | 594 g = misc.forsale
>>> 0 0 0 0 0 0 0 594 0
>>> 0 0 0 0 0 0 0 0 0 0
>>> 0 | 594 h = rec.autos
>>> 0 0 0 0 0 0 0 0 611
>>> 0 0 0 0 0 0 0 0 0 0
>>> 0 | 611 i = rec.motorcycles
>>> 0 0 0 0 0 0 0 0 0
>>> 616 1 0 0 0 0 0 0 0 0
>>> 0 | 617 j = rec.sport.baseball
>>> 0 0 0 0 0 0 1 0 0
>>> 0 620 0 0 0 0 0 0 0 0
>>> 0 | 621 k = rec.sport.hockey
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 580 0 0 0 0 0 1 0
>>> 0 | 581 l = sci.crypt
>>> 0 0 0 3 1 0 0 0 0
>>> 0 0 0 571 0 0 0 0 0 0
>>> 0 | 575 m = sci.electronics
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 0 2 583 0 0 0 0 0
>>> 0 | 585 n = sci.med
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 1 599 0 0 0 0
>>> 0 | 600 o = sci.space
>>> 0 1 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 615 0 0 0
>>> 0 | 616 p = soc.religion.christian
>>> 1 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 1 560 0 0
>>> 0 | 562 q = talk.politics.mideast
>>> 0 0 1 0 0 0 0 0 0
>>> 0 0 1 0 0 0 0 0 548 0
>>> 1 | 551 r = talk.politics.guns
>>> 10 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 1 1 0 2 344
>>> 1 | 359 s = talk.religion.misc
>>> 0 0 0 0 0 0 0 0 0
>>> 0 0 1 1 0 0 0 0 2 0
>>> 462 | 466 t = talk.politics.misc
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa 0.9847
>>> Accuracy 99.0869%
>>> Reliability 94.3334%
>>> Reliability (standard deviation) 0.2169
>>>
>>> Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 14304 ms (Minutes: 0.2384)
>>> + echo 'Testing on holdout set'
>>> Testing on holdout set
>>>
>>> [snip]
>>>
>>> INFO: Standard NB Results:
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances : 6718 90.1019%
>>> Incorrectly Classified Instances : 738 9.8981%
>>> Total Classified Instances : 7456
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a b c d e f g h i
>>> j k l m n o p q r s
>>> t <--Classified as
>>> 294 0 0 0 0 0 0 0 0
>>> 0 0 2 0 1 1 6 1 1 16
>>> 0 | 322 a = alt.atheism
>>> 0 345 6 14 6 11 6 0 0
>>> 0 0 5 7 1 3 0 0 0 0
>>> 0 | 404 b = comp.graphics
>>> 2 29 177 78 22 19 9 1 0
>>> 0 0 4 2 0 1 1 0 0 1
>>> 1 | 347 c = comp.os.ms-windows.misc
>>> 1 9 2 335 18 2 10 0 0
>>> 0 1 0 8 0 0 0 0 0 0
>>> 0 | 386 d = comp.sys.ibm.pc.hardware
>>> 1 4 2 13 347 3 5 1 0
>>> 0 1 0 7 1 0 0 0 1 0
>>> 0 | 386 e = comp.sys.mac.hardware
>>> 0 20 0 4 0 352 4 0 0
>>> 0 0 0 1 1 3 0 1 0 1
>>> 0 | 387 f = comp.windows.x
>>> 0 2 0 21 5 1 323 7 2
>>> 2 0 2 12 0 3 0 0 0 0
>>> 1 | 381 g = misc.forsale
>>> 0 1 0 0 1 0 15 363 8
>>> 1 0 0 4 1 0 0 0 1 0
>>> 1 | 396 h = rec.autos
>>> 0 1 0 0 0 0 6 6 370
>>> 0 0 0 0 1 0 0 0 0 1
>>> 0 | 385 i = rec.motorcycles
>>> 1 0 0 1 1 0 2 1 2
>>> 362 5 0 2 0 0 0 0 0 0
>>> 0 | 377 j = rec.sport.baseball
>>> 0 0 0 1 2 0 0 0 0
>>> 3 371 0 0 0 0 0 0 0 0
>>> 1 | 378 k = rec.sport.hockey
>>> 0 3 1 0 1 0 2 0 0
>>> 0 0 396 0 1 0 0 1 1 1
>>> 3 | 410 l = sci.crypt
>>> 0 7 0 7 7 2 6 4 0
>>> 0 0 1 369 2 2 0 0 0 0
>>> 2 | 409 m = sci.electronics
>>> 0 3 0 2 1 0 2 0 0
>>> 0 0 1 4 383 4 0 0 1 0
>>> 4 | 405 n = sci.med
>>> 0 5 0 0 1 0 3 0 0
>>> 0 0 0 1 0 374 1 0 0 1
>>> 1 | 387 o = sci.space
>>> 6 2 0 1 1 0 0 1 0
>>> 1 0 0 1 5 0 352 2 1 7
>>> 1 | 381 p = soc.religion.christian
>>> 1 1 0 0 0 0 0 0 0
>>> 0 1 0 0 0 0 0 373 1 0
>>> 1 | 378 q = talk.politics.mideast
>>> 0 0 0 0 0 0 1 0 1
>>> 0 0 2 0 0 0 0 0 346 2
>>> 7 | 359 r = talk.politics.guns
>>> 26 1 0 1 0 0 0 2 0
>>> 1 1 0 0 1 1 20 2 6 200
>>> 7 | 269 s = talk.religion.misc
>>> 1 0 0 0 0 0 0 2 0
>>> 0 1 0 0 2 2 0 1 14 0
>>> 286 | 309 t = talk.politics.misc
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa 0.8726
>>> Accuracy 90.1019%
>>> Reliability 85.4491%
>>> Reliability (standard deviation) 0.2222
>>>
>>> Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 10878 ms (Minutes: 0.1813)
>>>
>>> *SGD:*
>>> 7532 test files
>>>
>>> =======================================================
>>> Summary
>>> -------------------------------------------------------
>>> Correctly Classified Instances : 5649 75%
>>> Incorrectly Classified Instances : 1883 25%
>>> Total Classified Instances : 7532
>>>
>>> =======================================================
>>> Confusion Matrix
>>> -------------------------------------------------------
>>> a b c d e f g h i
>>> j k l m n o p q r s
>>> t <--Classified as
>>> 186 6 3 10 5 0 33 4 13
>>> 15 7 1 24 15 3 15 5 5 29
>>> 15 | 394 a = sci.space
>>> 5 309 0 3 2 5 0 0 0
>>> 1 9 21 2 0 0 18 4 4 1
>>> 1 | 385 b = comp.sys.mac.hardware
>>> 4 1 101 3 0 1 63 0 7
>>> 0 1 1 5 16 3 0 3 7 1
>>> 34 | 251 c = talk.religion.misc
>>> 11 12 1 265 1 10 3 0 0
>>> 17 10 11 5 2 0 11 3 6 21
>>> 0 | 389 d = comp.graphics
>>> 2 1 1 0 349 2 3 0 3
>>> 2 6 1 5 1 0 2 15 2 1
>>> 2 | 398 e = rec.motorcycles
>>> 7 20 3 19 2 254 6 0 2
>>> 11 2 39 7 2 0 4 2 2 9
>>> 3 | 394 f = comp.os.ms-windows.misc
>>> 2 1 13 0 0 0 247 0 1
>>> 1 3 0 6 2 4 0 2 3 5
>>> 29 | 319 g = alt.atheism
>>> 1 1 0 0 2 0 2 361 0
>>> 1 2 0 2 0 0 1 3 22 0
>>> 1 | 399 h = rec.sport.hockey
>>> 3 0 3 1 0 0 5 0 161
>>> 0 1 2 12 102 0 0 1 2 11
>>> 6 | 310 i = talk.politics.misc
>>> 2 8 0 19 0 19 0 0 1
>>> 294 10 11 4 2 0 5 0 3 11
>>> 6 | 395 j = comp.windows.x
>>> 2 10 0 1 1 0 0 0 0
>>> 1 347 13 2 1 0 5 3 2 2
>>> 0 | 390 k = misc.forsale
>>> 1 36 0 6 1 25 0 0 1
>>> 6 10 257 2 1 0 34 6 0 6
>>> 0 | 392 l = comp.sys.ibm.pc.hardware
>>> 2 2 2 2 1 0 12 0 0
>>> 6 10 4 312 5 2 13 11 3 3
>>> 6 | 396 m = sci.med
>>> 2 0 3 2 1 0 0 1 13
>>> 0 5 1 2 314 2 0 2 2 10
>>> 4 | 364 n = talk.politics.guns
>>> 1 0 2 1 1 0 34 1 33
>>> 1 3 0 1 8 271 1 4 5 6
>>> 3 | 376 o = talk.politics.mideast
>>> 3 14 0 8 2 8 3 1 1
>>> 7 12 29 6 2 1 245 13 2 32
>>> 4 | 393 p = sci.electronics
>>> 3 3 0 2 11 0 1 0 2
>>> 1 11 6 4 2 0 11 330 4 4
>>> 1 | 396 q = rec.autos
>>> 0 0 1 0 1 0 4 12 3
>>> 1 3 0 0 0 0 5 6 359 1
>>> 1 | 397 r = rec.sport.baseball
>>> 0 1 0 0 0 1 0 0 3
>>> 3 0 0 3 2 1 6 1 6 366
>>> 3 | 396 s = sci.crypt
>>> 0 2 11 1 1 0 40 0 1
>>> 2 3 4 2 1 0 5 0 2 2
>>> 321 | 398 t = soc.religion.christian
>>>
>>> =======================================================
>>> Statistics
>>> -------------------------------------------------------
>>> Kappa 0.7073
>>> Accuracy 75%
>>> Reliability 70.6238%
>>> Reliability (standard deviation) 0.2187
>>> Log-likelihood mean : -1.1182
>>> 25%-ile : -1.6911
>>> 75%-ile : -0.0803
>>>
>>> Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 10783 ms (Minutes: 0.17971666666666666)
>>>
>>>
>>>
>>>
>>> On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <su...@yahoo.com>wrote:
>>>
>>>> Thanks Andrew for reporting that. I rolled back the release to fix this
>>>> and few other issues.
>>>>
>>>> We have removed asf-examples*.sh from trunk as the sample file at the
>>>> url mentioned in ur email is not available.
>>>> This is something we need to fix and restore in 1.0.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <
>>>> ap.dev@outlook.com> wrote:
>>>>
>>>> from the asf-email-examples.sh script:
>>>>
>>>> # You will need to download or otherwise obtain some or all of the
>>>> Amazon ASF Em
>>>> ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566)
>>>> to use this
>>>> script.
>>>> # To obtain a full copy you will need to launch an EC2 instance and
>>>> mount the da
>>>> taset to download it, otherwise you can get a sample of it at
>>>> #
>>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>>>>
>>>> It looks like the:
>>>>
>>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>>>>
>>>> link is down.
>>>>
>>>> Is there somewhere else that we can get a subset of the ASF emails?
>>>>
>>>>
>>>>
>>>> Date: Tue, 21 Jan 2014 09:48:06 -0800
>>>> > Subject: Re: MAHOUT 0.9 Release - New URL
>>>> > From: andrew.musselman@gmail.com
>>>> > To: dev@mahout.apache.org
>>>> >
>>>> > Sure thing; continuing to smoke test the other examples tonight
>>>> >
>>>> >
>>>> > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <
>>>> suneel_marthi@yahoo.com>wrote:
>>>> >
>>>> > > Thanks Andrew M., see that some of the example scripts need to be
>>>> fixed as
>>>> > > they still refer to the deprecated algorithms.
>>>> > > See that the Streaming KMeans has failed for you as well.
>>>> > >
>>>> > > I'll be rolling back the release today to fix these issues.
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
>>>> > > andrew.musselman@gmail.com> wrote:
>>>> > >
>>>> > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default
>>>> 64-bit
>>>> > > Linux AMI from tarball.
>>>> > >
>>>> > > All tests pass.
>>>> > >
>>>> > > *Output of examples:*
>>>> > > *asf-email-examples.sh, run on mahout.apache.org
>>>> > > <http://mahout.apache.org>:*
>>>> > > *recommendations:*
>>>> > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
>>>> > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000 | less
>>>> > > 1
>>>> > >
>>>> > >
>>>> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
>>>> > > 4
>>>> > >
>>>> > >
>>>> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
>>>> > > 6
>>>> > >
>>>> > >
>>>> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
>>>> > > 8
>>>> > > [12758:1.0,19409:1.0,11112:1.0]
>>>> > > 11
>>>> > >
>>>> > >
>>>> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
>>>> > > 14
>>>> > >
>>>> > >
>>>> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
>>>> > > 15
>>>> > >
>>>> > >
>>>> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
>>>> > > 16
>>>> > >
>>>> > >
>>>> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
>>>> > > 18
>>>> > >
>>>> > >
>>>> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
>>>> > > 19 [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
>>>> > > 20
>>>> > >
>>>> > >
>>>> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
>>>> > > [snip]
>>>> > >
>>>> > > *clustering; kmeans:*
>>>> > > [snip]
>>>> > > Weight : [props - optional]: Point:
>>>> > > 1.0 :
>>>> > > [distance-squared=1.0193102046188427]:
>>>> > > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus=
>>>> > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
>>>> 7573:0.204,
>>>> > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093,
>>>> 9779:0.159,
>>>> > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
>>>> > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
>>>> > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
>>>> > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
>>>> > > 39789:0.110, 40743:0.190, 45775:0.086]
>>>> > > 1.0 : [distance-squared=0.9823018320457279]:
>>>> > > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus=
>>>> > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
>>>> 5336:0.106,
>>>> > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173,
>>>> 7832:0.072,
>>>> > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
>>>> > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
>>>> > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
>>>> > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
>>>> > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
>>>> > > 1.0 : [distance-squared=0.9509142993214911]:
>>>> > >
>>>> /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
>>>> > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
>>>> > > 4419:0.076,
>>>> > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056,
>>>> 7235:0.048,
>>>> > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123,
>>>> 7683:0.077,
>>>> > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
>>>> 10225:0.081,
>>>> > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
>>>> > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
>>>> > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
>>>> > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
>>>> > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
>>>> > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
>>>> > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
>>>> > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
>>>> > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
>>>> > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
>>>> > > 41280:0.065, 41696:0.072, 41947:0.118,
>>>> > > 43685:0.086, 44077:0.308,
>>>> > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
>>>> > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
>>>> > > [snip]
>>>> > >
>>>> > > *clustering; dirichlet:*
>>>> > > Get this complaint:
>>>> > > Running Dirichlet with K = 8
>>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
>>>> > > HADOOP_CONF_DIR=
>>>> > > MAHOUT-JOB:
>>>> > >
>>>> > >
>>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class:
>>>> dirichlet
>>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props
>>>> found on
>>>> > > classpath, will use command-line arguments only
>>>> > > Unknown program 'dirichlet' chosen.
>>>> > >
>>>> > > *clustering: minhash:*
>>>> > > Running Minhash
>>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
>>>> > > HADOOP_CONF_DIR=
>>>> > > MAHOUT-JOB:
>>>> > >
>>>> > >
>>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>>>> > > 14/01/21 05:17:27 WARN
>>>> > > driver.MahoutDriver: Unable to add class: minhash
>>>> > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found
>>>> on
>>>> > > classpath, will use command-line arguments only
>>>> > > Unknown program 'minhash' chosen.
>>>> > >
>>>> > > *classification; standard:*
>>>> > > =======================================================
>>>> > > Summary
>>>> > > -------------------------------------------------------
>>>> > > Correctly Classified Instances : 5384 87.7874%
>>>> > > Incorrectly Classified Instances : 749 12.2126%
>>>> > > Total Classified Instances : 6133
>>>> > >
>>>> > > =======================================================
>>>> > > Confusion Matrix
>>>> > > -------------------------------------------------------
>>>> > > a b c d
>>>> > > <--Classified as
>>>> > > 2949 7 531 25 | 3512 a = dev
>>>> > > 0 0 0 0 | 0 b = general
>>>> > > 99 8 1763 8 | 1878 c = user
>>>> > > 41 1 29 672 | 743 d = commits
>>>> > >
>>>> > > =======================================================
>>>> > > Statistics
>>>> > > -------------------------------------------------------
>>>> > > Kappa
>>>> > > 0.7877
>>>> > > Accuracy 87.7874%
>>>> > > Reliability 53.658%
>>>> > > Reliability (standard deviation) 0.4911
>>>> > >
>>>> > > *classification; complementary:*
>>>> > > =======================================================
>>>> > > Summary
>>>> > > -------------------------------------------------------
>>>> > > Correctly Classified Instances : 5530 90.1679%
>>>> > > Incorrectly Classified Instances : 603 9.8321%
>>>> > > Total Classified Instances :
>>>> > > 6133
>>>> > >
>>>> > > =======================================================
>>>> > > Confusion Matrix
>>>> > > -------------------------------------------------------
>>>> > > a b c d <--Classified as
>>>> > > 3168 0 276 68 | 3512 a = dev
>>>> > > 0 0 0 0 | 0 b = general
>>>> > > 196 0 1652 30 | 1878 c = user
>>>> > > 25 0 8 710 | 743 d =
>>>> > > commits
>>>> > >
>>>> > > =======================================================
>>>> > > Statistics
>>>> > > -------------------------------------------------------
>>>> > > Kappa 0.8259
>>>> > > Accuracy 90.1679%
>>>> > > Reliability 54.7459%
>>>> > > Reliability (standard deviation) 0.5005
>>>> > >
>>>> > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms
>>>> (Minutes:
>>>> > > 0.34836666666666666)
>>>> > >
>>>> > > *classification; sgd, with three categories:*
>>>> > > Running SGD Training
>>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
>>>> > > and
>>>> > > HADOOP_CONF_DIR=
>>>> > > MAHOUT-JOB:
>>>> > >
>>>> > >
>>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>>>> > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
>>>> > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
>>>> classpath,
>>>> > > will use command-line arguments only
>>>> > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
>>>> > > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
>>>> > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
>>>> > > --output=[asf-output/classification/sgd/models], --poolSize=[5],
>>>> > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
>>>> > > 24168 training files
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 1
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
>>>> > > 2
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 3
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 4
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 6
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 8
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 10
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00
>>>> > > 0.00 0.00 0.0000000 0.0000000 12
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 15
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 20
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 25
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 30
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000
>>>> > > 0.0000000 40
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 50
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 60
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 70
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 80
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 100
>>>> > > 0.000
>>>> > > 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 120
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 140
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 150
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 200
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 250
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00
>>>> > > 0.00 0.00 0.0000000 0.0000000 300
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 400
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 500
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 600
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 700
>>>> > > 0.000 0.00 none
>>>> > > 0.00 0.00 0.00 0.00 0.0000000
>>>> > > 0.0000000 800
>>>> > > 0.000 0.00 none
>>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
>>>> > > 1.0019413e-08 1000 -0.607 75.78 none
>>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
>>>> > > 1.0019413e-08 1200 -0.607 75.78 none
>>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
>>>> > > 1.0019413e-08 1400 -0.607 75.78 none
>>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
>>>> > > 1.0019413e-08 1500 -0.607 75.78 none
>>>> > > 0.24 43686.00 17924.00 329.50
>>>> > > 1.0571799e-08
>>>> > > 1.0032261e-08 2000 -0.487 82.65 none
>>>> > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
>>>> > > 1.0011902e-08 2500 -0.439 83.90 none
>>>> > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
>>>> > > 1.0011902e-08 3000 -0.439 83.90 none
>>>> > > 0.32 50635.00 28531.00 437.09 1.0551175e-08
>>>> > > 1.0000001e-08 4000 -0.351 88.14 none
>>>> > > 0.32 50635.00 32642.00 437.09 1.0551175e-08
>>>> > > 1.0000000e-08 5000 -0.378 87.10 none
>>>> > > 0.32 50635.00 36461.00 437.09
>>>> > > 1.0556652e-08
>>>> > > 1.0000001e-08 6000 -0.372 86.89 none
>>>> > > 0.32 50635.00 37768.00 437.09 1.0576742e-08
>>>> > > 1.0000001e-08 7000 -0.334 89.26 none
>>>> > > 0.32 50635.00 38807.00 437.09 1.0576742e-08
>>>> > > 1.0000000e-08 8000 -0.368 87.52 none
>>>> > > 0.32 50635.00 44731.00 437.09 1.0576716e-08
>>>> > > 1.0000000e-08 10000 -0.374 87.39 none
>>>> > > 0.32 50635.00 45672.00 437.09 1.0576716e-08
>>>> > > 1.0000000e-08 12000 -0.298 88.26 none
>>>> > > Exception in thread "main" java.lang.IllegalStateException:
>>>> > > java.lang.ArrayIndexOutOfBoundsException:
>>>> > > 2
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
>>>> > > at
>>>> > >
>>>> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
>>>> > > at
>>>> > >
>>>> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
>>>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>> Method)
>>>> > > at
>>>> > >
>>>> > >
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>> > >
>>>> > > at
>>>> > >
>>>> > >
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> > > at java.lang.reflect.Method.invoke(Method.java:622)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>>> > > at
>>>> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>> > > at
>>>> > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>>>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>> Method)
>>>> > > at
>>>> > >
>>>> > >
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>> > > at
>>>> > >
>>>> > >
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> > > at java.lang.reflect.Method.invoke(Method.java:622)
>>>> > > at
>>>> > > org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>>>> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
>>>> > > at
>>>> > > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
>>>> > >
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
>>>> > > at
>>>> > >
>>>> > >
>>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
>>>> > > at
>>>> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>> > > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>> > > at
>>>> > >
>>>> > >
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>>>> > > at
>>>> > >
>>>> > >
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> > > at java.lang.Thread.run(Thread.java:701)
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
>>>> > > andrew.musselman@gmail.com> wrote:
>>>> > >
>>>> > > > Trying out the build today
>>>> > > >
>>>> > > >
>>>> > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
>>>> suneel_marthi@yahoo.com
>>>> > > >wrote:
>>>> > > >
>>>> > > >> This is an issue (trivial one though) that needs to be fixed for
>>>> 0.9
>>>> > > >> Release, will be rerolling the release today (in the next few
>>>> hrs) and
>>>> > > >> putting out a new release candidate in staging.
>>>> > > >>
>>>> > > >> Thanks for reporting this Andrew P.
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
>>>> > > ap.dev@outlook.com>
>>>> > > >> wrote:
>>>> > > >>
>>>> > > >> I ran through the tests with on a CentOS VM
>>>> > > AMD64 2 cores 4 GB RAM. Had
>>>> > > >> a bit of trouble getting the Hadoop natives to compile and
>>>> therefore may
>>>> > > >> have run into some problems because of the hadoop setup. Ran
>>>> into some
>>>> > > >> problems in the example scripts. Particularly with
>>>> > > >> ./cluster-syntheticcontrol.sh ->4,5. I will run through the
>>>> rest of the
>>>> > > >> examples when im sure I've got hadoop setup right.
>>>> > > >>
>>>> > > >>
>>>> > > >> Apache Maven 3.1.2-SNAPSHOT
>>>> > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
>>>> > > >> Java home: /usr/java/jdk1.6.0_45/jre
>>>> > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch:
>>>> "amd64",
>>>> > > >> family: "unix"
>>>> > > >> $MAHOUT_LOCAL=true
>>>> > > >> Hadoop 2.2.0
>>>> > > >>
>>>> > > >>
>>>> > > >> a) Verify that u can unpack the release (tar or zip) ...passed
>>>> (tar)
>>>> > > >> [passed ]
>>>> > > >>
>>>> > > >> b) Verify u r able to compile the
>>>> > > distro
>>>> > > >>
>>>> > > >> mvn compile- [passed with warnings]
>>>> > > >>
>>>> > > >> [WARNING] Expected all dependencies to require Scala
>>>> version: 2.9.3
>>>> > > >> [WARNING] org.apache.mahout:mahout-math-scala:0.9 requires
>>>> scala
>>>> > > >> version: 2.9.3
>>>> > > >> [WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala
>>>> > > >> version: 2.9.2
>>>> > > >> [WARNING] Multiple versions of scala libraries detected!
>>>> > > >>
>>>> > > >> c) Run through the unit tests: mvn clean test
>>>> > > >> mvn clean test [passed]
>>>> > > >>
>>>> > > >> d) Run the
>>>> > > >> example scripts under $MAHOUT_HOME/examples/bin.
>>>> > > >> Please run through all the different options in each script
>>>> > > >>
>>>> > > >> Running example scripts with $MAHOUT_LOCAL=true
>>>> > > >>
>>>> > > >>
>>>> > > ./cluster-syntheticcontrol.sh ->1 [works]
>>>> > > >> ./cluster-syntheticcontrol.sh ->2 [works]
>>>> > > >> ./cluster-syntheticcontrol.sh ->3 [works]
>>>> > > >>
>>>> > > >>
>>>> > > >> ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
>>>> > > >> [...]
>>>> > > >> WARNING: Unable to add class:
>>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>>>> > > >> java.lang.ClassNotFoundException:
>>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>>>> > > >> at
>>>> > > >> java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>> > > >> at java.security.AccessController.doPrivileged(Native
>>>> Method)
>>>> > > >> at
>>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>> > > >> at
>>>> > > java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>>> > > >> at
>>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>>> > > >> at java.lang.Class.forName0(Native Method)
>>>> > > >> at java.lang.Class.forName(Class.java:171)
>>>> > > >> at
>>>> > > >>
>>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>>>> > > >> at
>>>> > > >>
>>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>>>> > > >> Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
>>>> > > >>
>>>> > > >>
>>>> > > >> ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
>>>> > > >>
>>>> > > >> WARNING: Unable to add class:
>>>> > > >>
>>>> > > org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>>>> > > >> java.lang.ClassNotFoundException:
>>>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>>>> > > >> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>> > > >> at java.security.AccessController.doPrivileged(Native
>>>> Method)
>>>> > > >> at
>>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>>> > > >> at
>>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>>> > > >> at java.lang.Class.forName0(Native Method)
>>>> > > >> at
>>>> > > java.lang.Class.forName(Class.java:171)
>>>> > > >> at
>>>> > > >>
>>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>>>> > > >> at
>>>> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>>>> > > >> Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
>>>> > > >> WARNING: No
>>>> > > >>
>>>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found
>>>> > > on
>>>> > > >> classpath, will use command-line arguments only
>>>> > > >> Unknown program
>>>> > > >> 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
>>>> chosen.
>>>> > > >>
>>>> > > >>
>>>> > > >> ./classify-20newsgroups.sh ->1 [works]
>>>> > > >> ./classify-20newsgroups.sh ->2 [works]
>>>> > > >>
>>>> > > >>
>>>> > > >> cluster-reuters.sh ->1 [works]
>>>> > > >>
>>>> > > cluster-reuters.sh ->2 [works]
>>>> > > >> cluster-reuters.sh ->3 [works]
>>>> > > >>
>>>> > > >> Same error as noted previosly in the thread:
>>>> > > >>
>>>> > > >> cluster-reuters.sh ->4 [0 clusters]
>>>> > > >>
>>>> > > >> [...]
>>>> > > >>
>>>> > > >> WARNING: No qualcluster.props found on classpath, will use
>>>> > > >> command-line arguments only
>>>> > > >> Num clusters: 0; maxDistance: 0.000000
>>>> > > >> [Dunn Index]
>>>> > > >> First: Infinity
>>>> > > >> [Davies-Bouldin Index] First: NaN
>>>> > > >> Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
>>>> > > >> INFO: Program took 669 ms (Minutes: 0.01115)
>>>> > > >> cluster,distance.mean,distance.sd
>>>> > > >>
>>>> > >
>>>> > >
>>>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
>>>> > > >> > From: suneel_marthi@yahoo.com
>>>> > > >> > Subject: MAHOUT 0.9 Release - New URL
>>>> > > >> > To: user@mahout.apache.org; dev@mahout.apache.org
>>>> > > >> >
>>>> > > >> > Third time's a Charm!!!
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > Here's the new URL for Mahout 0.9 Release:
>>>> > > >> >
>>>> > > >>
>>>> > >
>>>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>>>> > > >> >
>>>> > > >> > For those volunteering to test this, some of the things to be
>>>> > > verified:
>>>> > > >> >
>>>> > > >> > a) Verify that u can unpack the release (tar or zip)
>>>> > > >> > b) Verify u r able to compile the distro
>>>> > > >> > c) Run through the unit tests: mvn clean test
>>>> > > >> > d) Run the example scripts
>>>> > > >> under $MAHOUT_HOME/examples/bin. Please run through all the
>>>> different
>>>> > > >> options in each script.
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > Committers
>>>> > > >> > and PMC members:
>>>> > > >> > ---------------------------------------
>>>> > > >> >
>>>> > > >> > Need 'at least 3 +1 votes' for the Release to pass.
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > Thanks and
>>>> > > Regards.
>>>> > > >>
>>>> > > >
>>>> > > >
>>>> > >
>>>>
>>>
>>>
>>
>
Re: MAHOUT 0.9 Release - New URL
Posted by Andrew Musselman <an...@gmail.com>.
*clustering-syntheticcontrol.sh*
*Canopy:*
[snip]
1.0 : [distance-squared=1740.681000315628]: [35.486, 25.600,
29.914, 30.200, 27.654, 28.647, 26.582, 32.941, 34.728, 29.047, 34.706,
34.764, 28.816, 30.271, 25.784, 25.035, 35.436, 29.075, 24.267, 24.625,
28.162, 26.218, 28.773, 28.966, 28.802, 34.292, 26.211, 33.363, 32.920,
31.891, 34.504, 32.686, 24.327, 35.981, 31.390, 10.832, 20.238, 10.051,
14.877, 10.570, 19.603, 14.544, 10.667, 16.470, 19.007, 10.352, 13.473,
12.196, 10.684, 16.620, 20.434, 17.069, 18.744, 9.599, 11.195, 12.002,
10.017, 17.149, 14.850, 10.890]
1.0 : [distance-squared=1455.363773097357]: [31.022, 28.140,
26.730, 26.570, 29.561, 26.966, 28.049, 25.673, 33.721, 26.275, 30.410,
31.101, 24.019, 35.659, 25.253, 25.932, 28.618, 32.423, 33.666, 33.745,
35.118, 29.164, 25.477, 31.947, 35.491, 30.730, 25.820, 24.651, 25.528,
31.343, 29.005, 31.825, 26.891, 28.194, 31.429, 16.935, 8.070, 16.604,
14.743, 10.342, 8.155, 10.395, 17.689, 16.791, 14.138, 15.761, 6.787,
13.062, 16.660, 15.021, 9.891, 9.216, 11.550, 8.877, 18.220, 9.477, 10.342,
16.430, 11.898, 15.366]
1.0 : [distance-squared=1679.9304895378882]: [29.625, 25.503,
31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
15.285, 22.528, 20.657, 24.129]
1.0 : [distance-squared=2044.2887801683828]: [27.414, 25.397,
26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
20.229, 11.131, 9.980, 10.720]
1.0 : [distance-squared=1385.3154063160764]: [35.899, 26.672,
34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
16.546, 15.927, 18.084, 17.475]
1.0 : [distance-squared=1920.6376615603585]: [24.538, 24.280,
28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
19.310, 12.999, 17.460]
1.0 : [distance-squared=2192.939571172661]: [34.335, 30.938,
31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
11.743, 11.699, 10.152]
Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: Wrote 6 clusters
Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 5827 ms (Minutes: 0.09711666666666667)
*K-means:*
[snip]
1.0 : [distance-squared=2873.881301031739]: [26.369, 37.791,
41.839, 39.694, 36.728, 35.079, 30.668, 24.755, 20.610, 18.885, 15.459,
14.074, 19.117, 34.230, 32.202, 40.715, 39.543, 37.193, 39.448, 30.829,
25.273, 22.324, 19.801, 13.505, 17.462, 24.287, 27.398, 38.577, 42.108,
41.947, 43.987, 41.331, 28.958, 23.664, 20.308, 22.244, 12.149, 15.768,
18.964, 32.579, 33.202, 36.205, 42.364, 40.601, 35.415, 39.576, 33.145,
19.936, 23.062, 19.053, 24.383, 19.611, 25.218, 38.860, 36.570, 38.964,
36.946, 36.900, 32.593, 31.563]
1.0 : [distance-squared=2525.8924241648783]: [35.389, 31.178,
40.041, 43.034, 49.524, 40.942, 42.369, 30.153, 26.253, 23.178, 19.885,
19.693, 21.837, 26.858, 23.533, 29.798, 43.401, 43.349, 48.238, 43.868,
44.017, 28.056, 25.284, 22.737, 13.703, 14.023, 22.115, 23.720, 25.478,
37.870, 37.868, 46.954, 43.135, 41.286, 37.269, 29.347, 24.312, 21.743,
19.152, 21.668, 10.882, 16.147, 30.020, 28.472, 38.716, 44.620, 47.847,
48.846, 39.361, 38.449, 33.105, 19.935, 14.961, 12.149, 12.630, 13.459,
18.967, 35.473, 30.146, 45.527]
1.0 : [distance-squared=2392.7171990886272]: [27.662, 37.199,
39.158, 44.264, 46.473, 40.178, 38.728, 24.412, 25.155, 15.938, 13.125,
16.867, 13.875, 29.969, 34.322, 40.870, 44.225, 47.594, 47.607, 44.751,
39.813, 32.461, 16.646, 16.305, 21.256, 20.627, 19.267, 31.901, 34.995,
43.643, 43.152, 47.125, 48.524, 33.131, 32.697, 20.300, 14.350, 18.330,
12.155, 11.261, 19.701, 29.313, 30.457, 39.914, 49.543, 49.851, 46.445,
39.561, 28.860, 22.486, 13.967, 10.006, 11.715, 15.945, 21.348, 25.439,
41.024, 37.105, 45.623, 43.589]
1.0 : [distance-squared=1419.8378244373016]: [25.784, 34.129,
42.659, 37.176, 35.961, 34.307, 32.108, 29.749, 25.047, 17.455, 24.652,
25.311, 22.995, 30.256, 25.955, 28.426, 34.556, 40.386, 39.642, 40.566,
32.612, 34.091, 26.033, 25.668, 26.545, 17.338, 24.980, 20.134, 27.542,
36.612, 31.855, 37.947, 39.736, 33.535, 36.607, 37.479, 32.612, 22.262,
20.662, 16.124, 24.547, 27.686, 21.747, 27.198, 31.259, 40.569, 37.067,
34.465, 34.730, 33.371, 23.060, 30.162, 22.022, 22.216, 14.812, 19.357,
24.508, 34.432, 32.155, 34.839]
1.0 : [distance-squared=4186.814512311335]: [25.870, 39.195,
36.908, 47.052, 47.384, 40.741, 42.494, 30.282, 25.834, 17.650, 16.004,
17.895, 13.321, 19.045, 27.440, 31.911, 39.208, 43.622, 41.567, 44.815,
44.921, 35.422, 35.477, 23.190, 17.859, 14.684, 23.504, 23.141, 21.746,
30.816, 31.361, 37.015, 38.094, 46.688, 47.681, 43.777, 39.652, 31.701,
23.767, 22.265, 22.654, 22.327, 19.195, 21.163, 29.602, 27.563, 36.244,
38.859, 44.234, 42.352, 42.160, 40.172, 30.094, 21.092, 25.193, 13.096,
18.111, 14.754, 27.386, 27.026]
1.0 : [distance-squared=1544.4011543572997]: [28.075, 41.784,
42.120, 38.735, 44.320, 34.316, 32.212, 31.868, 24.301, 14.547, 17.178,
22.279, 24.357, 31.011, 31.444, 34.837, 46.550, 48.301, 38.859, 42.363,
35.657, 31.499, 19.794, 12.124, 15.371, 12.436, 15.763, 24.679, 32.597,
43.004, 36.616, 38.935, 42.954, 34.957, 36.183, 28.177, 16.326, 14.988,
10.680, 22.728, 24.075, 24.058, 36.616, 43.982, 39.198, 40.118, 40.078,
34.752, 34.018, 23.750, 18.374, 12.251, 15.539, 18.699, 28.973, 24.044,
39.404, 38.034, 46.458, 44.432]
1.0 : [distance-squared=825.9338725427806]: [33.670, 38.675,
39.742, 41.989, 37.291, 43.975, 31.909, 25.878, 31.080, 15.858, 13.950,
23.097, 19.983, 21.692, 31.579, 38.570, 33.376, 38.843, 41.936, 33.534,
39.195, 32.897, 25.343, 18.523, 15.089, 17.771, 22.614, 25.313, 23.687,
29.010, 41.995, 35.712, 40.872, 41.669, 32.156, 25.162, 24.980, 23.705,
18.413, 20.975, 14.906, 26.171, 30.165, 27.818, 35.083, 39.514, 37.851,
33.967, 32.338, 34.977, 26.589, 28.079, 19.597, 24.669, 23.098, 25.685,
28.215, 34.940, 36.910, 39.749]
Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: Wrote 6 clusters
Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 16902 ms (Minutes: 0.2817)
*Fuzzy k-means:*
[snip]
1.0 : [distance-squared=971.7369782121968]: [29.625, 25.503,
31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, 31.817,
24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, 32.322,
28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, 18.264,
24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, 22.296,
26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, 21.750,
16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, 19.136,
15.285, 22.528, 20.657, 24.129]
1.0 : [distance-squared=2054.618163154475]: [27.414, 25.397,
26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, 27.511,
32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, 33.519,
29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, 35.025,
26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, 11.549,
20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, 17.236,
19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, 14.118,
20.229, 11.131, 9.980, 10.720]
1.0 : [distance-squared=954.6503560728597]: [35.899, 26.672,
34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, 31.032,
34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, 32.979,
26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, 24.553,
21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, 21.542,
25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, 21.939,
25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, 16.846,
16.546, 15.927, 18.084, 17.475]
1.0 : [distance-squared=2817.9170498632957]: [24.538, 24.280,
28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, 31.301,
25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, 25.295,
25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, 18.137,
7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, 19.457,
10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, 13.762,
12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, 12.042,
19.310, 12.999, 17.460]
1.0 : [distance-squared=3472.3684696871424]: [34.335, 30.938,
31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, 31.371,
26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, 24.940,
33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, 12.694,
6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, 6.976,
17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.400,
15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.913,
11.743, 11.699, 10.152]
Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: Wrote 6 clusters
Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 106615 ms (Minutes: 1.7769166666666667)
*Dirichlet and Meanshift:*
Already detailed in M-1400, deprecated jobs still referenced.
On Tue, Jan 21, 2014 at 6:20 PM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:
> *cluster-reuters.sh*
> *k-means:*
>
> [snip]
> :VL-19482{n=913 c=[0.06:0.011, 0.1:0.007, 0.13:0.010, 0.25:0.016,
> 0.38:0.020, 0.4:0.007, 0.5:0.032, 0
> Top Terms:
> banks =>
> 3.841823268955143
> bank =>
> 3.80633066361209
> debt =>
> 3.28065219870794
> said =>
> 2.5965700942088583
> he =>
> 2.335682813857497
> foreign =>
> 2.2217853688201403
> billion =>
> 2.1970193848291335
> would =>
> 1.9932392063955617
> loans =>
> 1.9309276792854233
> interest =>
> 1.787324501938
> have =>
> 1.762981951432578
> its =>
> 1.7615109954971866
> which =>
> 1.5822081148036862
> has =>
> 1.5600708189041956
> dlrs =>
> 1.5571038313005996
> finance =>
> 1.5539758811252924
> new =>
> 1.5176015811577555
> had =>
> 1.5138723701401844
> brazil =>
> 1.5083369853593172
> payments =>
> 1.4539044255886517
> Weight : [props - optional]: Point:
>
> :VL-7320{n=2726 c=[0:0.003, 0.1:0.010, 0.2:0.007, 0.3:0.009, 0.4:0.007,
> 0.40:0.003, 0.5:0.009, 0.57:0
> Top Terms:
> vs =>
> 6.126130791333171
> net =>
> 4.012191567277523
> cts =>
> 3.822006848832744
> shr =>
> 3.6786004856764527
> mln =>
> 2.9011643584038698
> loss =>
> 2.788368861463607
> qtr =>
> 2.714140225051522
> revs =>
> 2.4739861236454717
> profit =>
> 1.8146888090247015
> note =>
> 1.7977163272138388
> dlrs =>
> 1.6164390808155846
> avg =>
> 1.3901765773336587
> shrs =>
> 1.3856326531419314
> mths =>
> 1.3168717272038506
> 4th =>
> 1.2161158425617289
> oper =>
> 1.182419473776814
> year =>
> 1.178086061733047
> nine =>
> 1.0670554836445316
> 3rd =>
> 1.041334410056592
> inc =>
> 1.0019361981554935
> Weight : [props - optional]: Point:
>
>
> Inter-Cluster Density: 0.45562152681859414
> Intra-Cluster Density: 0.6952712632167628
> CDbw Inter-Cluster Density: 0.0
> CDbw Intra-Cluster Density: 16.486930227598684
> CDbw Separation: 194.49005884464628
>
> *fuzzy k-means:*
> :SV-18539{n=1039 c=[0:0.026, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> 0.01:0.005, 0.02:0.002, 0.0
> Top Terms:
> said =>
> 1.8665592354713065
> its =>
> 1.1335212213411592
> pct =>
> 1.0862816801353348
> dlrs =>
> 1.0854998884993752
> mln =>
> 1.043163996400643
> from =>
> 0.9684961110525736
> has =>
> 0.912161511978058
> company =>
> 0.8754186972808333
> mar =>
> 0.8675333452422878
> inc =>
> 0.7678617590362815
> would =>
> 0.7610968883652675
> he =>
> 0.7459988770503974
> which =>
> 0.7435613119406804
> year =>
> 0.7302840632748394
> u.s =>
> 0.7281061062439116
> shares =>
> 0.7260764102983083
> corp =>
> 0.7179807367808658
> new =>
> 0.7044203783157115
> stock =>
> 0.6962010978721442
> have =>
> 0.6464265467298506
> :SV-9431{n=1034 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> 0.01:0.004, 0.02:0.002, 0.02
> Top Terms:
> said =>
> 1.864911184196927
> dlrs =>
> 1.199286689822081
> mln =>
> 1.1802134783562215
> pct =>
> 1.1529704214798124
> its =>
> 1.1184398851519701
> from =>
> 1.016647848050332
> company =>
> 0.894703604722841
> mar =>
> 0.879986159541356
> has =>
> 0.8642799128491316
> year =>
> 0.8271823503717782
> inc =>
> 0.7871293745341424
> corp =>
> 0.737705498468879
> which =>
> 0.722975201852743
> would =>
> 0.708000816484415
> u.s =>
> 0.7073294276173905
> billion =>
> 0.7055723996916351
> he =>
> 0.7042684217823294
> new =>
> 0.6834737905434939
> shares =>
> 0.6753327384172428
> stock =>
> 0.6576225144041699
> :SV-4785{n=1044 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
> 0.01:0.006, 0.02:0.002, 0.02
> Top Terms:
> said =>
> 1.8796076179735086
> its =>
> 1.172025965452378
> dlrs =>
> 1.130422792460914
> pct =>
> 1.082038255241358
> mln =>
> 1.0772146872767114
> company =>
> 0.9662235879639138
> from =>
> 0.9473172871605616
> has =>
> 0.9224712965830099
> mar =>
> 0.8769325856924421
> inc =>
> 0.8360245257169788
> shares =>
> 0.8334595641384324
> stock =>
> 0.7704621839612175
> corp =>
> 0.7682400250301806
> which =>
> 0.7389988207856137
> would =>
> 0.7339708917389389
> year =>
> 0.7088414843731325
> new =>
> 0.7038109468655172
> he =>
> 0.6993994455501005
> u.s =>
> 0.6772649147622415
> share =>
> 0.6241804830055171
>
> *lda:*
>
> [snip]
> 21539
> {0.02:0.7071698119320297,0.01:0.09185661419250732,0.055:0.05559716236030052,0:0.04416624482186779,0.046:0.04110903741109505,0.10:0.036664417789754995,0.073:0.009543544335363714,0.006913:0.0050293341354450535,0.007050:0.004031353935388081,0.003:0.0019163134919350053}
> 21540
> {0.04:0.4289840457590362,0.006913:0.3764552842292379,0.03:0.14201700033942147,0.025:0.031492533562460345,0.01:0.0057428959027208,0.0625:0.005563615996288134,0.007050:0.004410121345864598,0.02:0.003315679992182833,0.06:0.0010680133665222197,0.057:2.6908116668663575E-4}
> 21541
> {0:0.6323790020346274,0.02:0.282038438551802,0.06:0.05929917592183956,0.046:0.021922159161082488,0.025:0.003123559949176154,0.1:6.163861804777446E-4,0.057:1.1513607281652563E-4,0.077:7.546641269535658E-5,0.05:6.020069105221075E-5,0.04:5.727546417602751E-5}
> 21542
> {0.06:0.7916202902993545,0.003:0.12474538653424426,0.03:0.0516187501990535,0.073:0.013078206873952192,0.077:0.006874558859394474,0.1:0.004726257215175674,0.07:0.0029954110394770084,0.02:0.0015535221634909381,0.0625:8.057802584933225E-4,0.006913:5.349608183182145E-4}
> 21543
> {0.03:0.8557703918728101,0.007050:0.0884696514495358,0.01:0.02386217742025169,0.055:0.01794750983209314,0.046:0.007244240662980594,0.02:0.0025021610305199344,0.1:0.0011795344808501283,0.0625:9.555718731181631E-4,0.077:8.384080940465597E-4,0.003:2.4480111447900804E-4}
> 21544
> {0.006913:0.6497351267772595,0.03:0.2586364130498913,0.003:0.0845308091084703,0.05:0.003488921831506818,0.0625:0.0011991059579690732,0.06:5.301659535652308E-4,0.055:3.152013670552699E-4,0.025:3.11949418681945E-4,0.02:2.2346509541410544E-4,0.057:1.727741085612565E-4}
> 21545
> {0.07:0.24937001680778675,0.03:0.22854680579732564,0.1:0.22068070811382484,0.05:0.21891358916881656,0.06:0.04971205462818302,0.046:0.029384231252419384,0.025:0.0014118858333346275,0.0625:0.0012321257282372393,0.073:3.1606402052550454E-4,0.04:1.1662985389192366E-4}
> 21546
> {0.006913:0.652435612545713,0.073:0.24282539253890825,0.003:0.055020663101050894,0.046:0.01978650831133704,0.04:0.011699600233998459,0.02:0.006822518565048025,0.05:0.0042733514779345234,0.025:0.0020768247329981497,0.03:0.0015466188524926666,0:0.0010433788491090747}
> 21547
> {0.1:0.935072008706917,0.077:0.040616382672055494,0.046:0.023138054104883225,0.06:3.42252302356326E-4,0.03:2.672247473681925E-4,0.007050:9.004419280457053E-5,0.006913:7.844271097106069E-5,0.04:7.198835322717653E-5,0.073:5.8017696474244885E-5,0.02:4.474316852571109E-5}
> 21548
> {0.006913:0.4510141547325999,0.03:0.21472666934984877,0.07:0.15060570743672352,0.046:0.08216181853028293,0.05:0.07498297963542139,0.077:0.01401215532342401,0.04:0.006135722806477439,0.073:0.0031124469556872442,0.02:0.001327252389650958,0.025:4.5167404646311124E-4}
> 21549
> {0.077:0.5249260290096315,0:0.27520186965742544,0.073:0.08959570186504386,0.057:0.05063809804337512,0.02:0.045313417578343,0.03:0.00856024297303885,0.06:0.0034039136814002697,0.07:8.611323331226122E-4,0.05:3.9387255234958607E-4,0.04:3.454752498776842E-4}
> 21550
> {0.077:0.8319708524327014,0.007050:0.16001430652201168,0.0625:0.0024884528530674226,0.03:0.0024228964094551416,0.006913:0.0010048446080994332,0.025:5.974607168723063E-4,0.06:2.7741593377792194E-4,0.057:2.2975316663437597E-4,0.073:2.1769059479546598E-4,0.01:1.4945234676270913E-4}
> 21551
> {0:0.43298549949426596,0.07:0.30407459462158964,0.0625:0.15744077685124136,0.01:0.06385977925647884,0.025:0.020205557109867888,0.04:0.013226123091680062,0.055:0.006501562964287696,0.073:5.871897881404074E-4,0.003:4.554836644848264E-4,0.05:2.0489216962221778E-4}
> 21552
> {0.06:0.7383744333942458,0.02:0.07321126682102753,0.07:0.05910378841288311,0.10:0.056907223730939045,0:0.02739559786902668,0.055:0.02228913751272657,0.1:0.00943274247398869,0.073:0.007301445750018608,0.03:0.0027711985062277246,0.046:0.0022569760697531112}
> 21553
> {0.006913:0.9383779606743132,0.07:0.018356074893823605,0.1:0.017572502072403694,0.025:0.011031848455352145,0.02:0.007110054905474798,0.10:0.0032390933761145377,0.01:0.0022676909091530165,0.06:6.35712654913703E-4,0.003:3.037608224814005E-4,0.0625:2.569989005512836E-4}
> 21554
> {0.02:0.4740260172915081,0.03:0.3264871353578654,0.003:0.12383071192450323,0:0.0387913487693229,0.046:0.013922363892934853,0.01:0.01115832101415319,0.055:0.004438529958216764,0.006913:0.0031112721201723413,0.025:0.002399946628810436,0.073:6.131643125523258E-4}
> 21555
> {0.01:0.42396888624053325,0.057:0.2876207419392007,0.055:0.13436747721404915,0.10:0.0923627978966975,0.05:0.030530372962547347,0.003:0.008404234144369286,0.006913:0.007605013189108045,0.007050:0.005108825532219537,0.0625:0.004651573366090424,0.025:0.002180181204608079}
> 21556
> {0.02:0.9634685201058687,0.077:0.02816791882994464,0.003:0.007368001305747207,0.01:4.6299664334060455E-4,0.055:6.358811991095171E-5,0.06:5.5240076581247115E-5,0.03:5.231350505975146E-5,0.073:4.4773553407989E-5,0:3.6810457882911134E-5,0.006913:3.469265861451538E-5}
> 21557
> {0.06:0.9993947051352264,0.05:2.53296407549323E-4,0.03:9.647254237540585E-5,0.04:4.3439270433017595E-5,0.006913:3.354681491698619E-5,0.046:2.8586727566416525E-5,0.007050:2.5584606074089293E-5,0.02:2.2477243875316502E-5,0:1.9194783598477564E-5,0.073:1.6900267481892075E-5}
> 21558
> {0.06:0.9985073788555696,0.03:8.033091100049725E-4,0.02:1.1953766739610202E-4,0.1:9.257610652171745E-5,0.006913:6.903783269939522E-5,0.04:5.7960967280609926E-5,0.07:5.5767050956214925E-5,0.046:4.857027337508731E-5,0.007050:3.5964741541364354E-5,0.057:3.538185773175377E-5}
> 21559
> {0.006913:0.5411158302162348,0.073:0.10595054605908563,0.04:0.10013413310674449,0.007050:0.08891690362990352,0.003:0.08028744789933502,0.03:0.07779866329563544,0.025:0.0024314950846438975,0.0625:0.0018111845757907532,0.077:5.019763638644379E-4,0.055:2.6705221435486376E-4}
> 21560
> {0.06:0.9978884964462115,0.03:9.139179655096763E-4,0.02:6.066525825847198E-4,0.04:3.2209695809936266E-4,0.006913:7.970816960964983E-5,0.007050:5.58984012266038E-5,0.046:1.545206468939752E-5,0.077:1.5319891034946036E-5,0:1.5223193391279898E-5,0.073:1.1687759018690983E-5}
> 21561
> {0.06:0.8704787441462888,0.007050:0.09131090501970876,0.0625:0.03778566273654969,0.046:8.111987924033587E-5,0.01:6.397000919080148E-5,0.1:4.979138875277178E-5,0.073:2.7942853854174412E-5,0.03:2.762357950066146E-5,0.077:2.6820742114533983E-5,0.025:2.6208487691114472E-5}
> 21562
> {0.06:0.9625603310705717,0.02:0.01708933716171641,0.007050:0.011374975845817934,0.07:0.00482114999912725,0.057:0.003310531318631415,0.077:2.568457462720504E-4,0.025:1.3439876933758153E-4,0.1:1.0512433283405881E-4,0:8.737443941062146E-5,0.046:5.9284232059258864E-5}
> 21563
> {0.06:0.9996809273982157,0.046:1.6325482120709976E-4,0.02:2.1084598024339765E-5,0.006913:1.7901242019979392E-5,0.04:1.3991162886383784E-5,0.03:1.3682157013015017E-5,0.077:1.3602492456590212E-5,0.007050:1.206712606141859E-5,0.1:1.0231842284246997E-5,0.057:7.704725064510759E-6}
> 21564
> {0.06:0.9997861963976675,0.03:2.7140899197995276E-5,0.077:2.4456158311423386E-5,0.04:1.7952818233846462E-5,0.02:1.6334455815684533E-5,0.006913:1.4910843270220926E-5,0.073:1.4893553937733922E-5,0.1:1.2545291899719683E-5,0.007050:1.2337617858874285E-5,0.07:1.121302251254011E-5}
> 21565
> {0.006913:0.5047832315246878,0.007050:0.2502505818382197,0.04:0.09937533960784072,0.03:0.05332716291468396,0.0625:0.035738976624857435,0.05:0.023139962103851885,0.1:0.01510786357969295,0.025:0.01294816540331917,0.06:0.003589347327961106,0.073:4.8607269023994543E-4}
> 21566
> {0.03:0.865579490292393,0.073:0.065013560785593,0.077:0.056622094108767465,0.046:0.006811842330071251,0.057:0.0021561477140846267,0.01:0.0013149375957061502,0.04:7.456782721333958E-4,0.05:6.412995789267404E-4,0:2.107021879325011E-4,0.007050:1.8121393989190674E-4}
> 21567
> {0.077:0.6528663315309344,0.03:0.2794089480653573,0.025:0.060214953606503134,0.003:0.0029851203917978303,0.01:0.002840784719750811,0.007050:5.508511345707982E-4,0.02:3.3395642786457786E-4,0.07:1.7452781529689483E-4,0.055:1.541344869853217E-4,0.046:1.0154945247629696E-4}
> 21568
> {0.057:0.7925855379581803,0.03:0.2036907633660934,0.06:0.002283924010657722,0.046:0.0011125177332923534,0.10:6.706442563331911E-5,0.02:3.9070227131596934E-5,0.07:3.266094677087569E-5,0.1:2.3436639268605713E-5,0.077:2.318067689954084E-5,0.006913:2.273173553155518E-5}
> 21569
> {0.06:0.7223525910216753,0.1:0.22280373045161775,0.04:0.04239924319412595,0.02:0.006529556700876843,0.007050:0.004661124794787862,0.10:2.017974034648702E-4,0.05:2.017516118028694E-4,0.025:1.583677755896652E-4,0.006913:1.1939663934259253E-4,0:8.736457986006156E-5}
> 21570
> {0.073:0.5321098304788365,0.006913:0.3897385574581158,0.02:0.051125703897749404,0.077:0.012417023254098358,0.01:0.006491518762040415,0.03:0.005434610908750246,0.055:9.352347087701305E-4,0.06:5.239363525283659E-4,0.10:4.4114444135088393E-4,0.003:2.2360302221231853E-4}
> 21571
> {0.06:0.9074255414695478,0.05:0.08971808812931319,0.02:0.0019124104766371694,0.1:2.581234320438502E-4,0.073:1.9041498001195312E-4,0.046:8.348355382104383E-5,0.006913:5.748062878632201E-5,0.057:4.8979196235823963E-5,0.04:4.4005272512340306E-5,0.03:4.1089625609562384E-5}
> 21572
> {0.06:0.7216888928389846,0.04:0.1955684645266304,0.006913:0.08235454447065854,0.1:9.82899034505965E-5,0.046:4.9184577303445956E-5,0.05:4.021965070167039E-5,0.007050:3.081280652634891E-5,0.073:2.791145099471127E-5,0.02:1.95681856192452E-5,0:1.9037190007395713E-5}
> 21573
> {0.05:0.8568971411565196,0.046:0.12909436895238377,0.06:0.012704234652048044,0.04:2.472784666357729E-4,0.055:2.1886698996361582E-4,0.1:1.1974451444757112E-4,0.0625:9.082357988309755E-5,0.07:9.030017229129562E-5,0.03:7.268045692763623E-5,0.073:6.607926928741721E-5}
> 21574
> {0.046:0.5619466458628039,0.006913:0.184782367089353,0.0625:0.09726566772972363,0.003:0.09534816862353344,0.02:0.015692927163565275,0.073:0.015492017672231727,0.01:0.01411218625979968,0.007050:0.01037341031640615,0.055:0.0020124341216292752,0:0.0013828685922332715}
> 21575
> {0.05:0.3167681189235041,0.06:0.2879333280436204,0.046:0.22584628506521745,0.003:0.15994241628395953,0.0625:0.006482994028630967,0.1:0.0016773104050919493,0.055:4.8725015996676173E-4,0.03:2.432893903382962E-4,0.025:1.6015914759364425E-4,0.006913:1.0021292427951807E-4}
> 21576
> {0.077:0.2407816981967022,0.003:0.18594668103110193,0.10:0.14951775492012523,0.0625:0.09241152906714677,0.007050:0.08997645163280943,0.057:0.07102865286733068,0.055:0.048046579920457584,0.05:0.03776387140040494,0.073:0.033355081179026046,0.006913:0.015980584385115525}
> 21577
> {0.06:0.9599074612361259,0.1:0.03694135499501186,0.04:0.0021941315048273186,0.057:4.7070027358666304E-4,0.02:1.8231606308803002E-4,0.0625:6.234942557920162E-5,0.006913:5.159011604129561E-5,0.046:3.9832812943124216E-5,0.007050:3.0380683715134534E-5,0.05:2.8920851352755496E-5}
>
>
> *Streaming k-means:*
>
> [snip]
> INFO: Number of Centroids: 0
> Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
> WARNING: job_local23982482_0001
> java.lang.IllegalArgumentException: Must have nonzero number of training
> and test vectors. Asked for %.1f %% of %d vectors for test
> [10.000000149011612, 0]
> at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
> at
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
> at
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
> at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
> at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
> at
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
> at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
>
> [snip]
>
> WARNING: No qualcluster.props found on classpath, will use command-line
> arguments only
> Num clusters: 0; maxDistance: 0.000000
> [Dunn Index] First: Infinity
> [Davies-Bouldin Index] First: NaN
> Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 535 ms (Minutes: 0.008916666666666666)
> cluster,distance.mean,distance.sd
> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>
>
> On Tue, Jan 21, 2014 at 1:47 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
>> *classify-20newsgroups.sh*
>>
>> *Complementary naive bayes:*
>> =======================================================
>> Summary
>> -------------------------------------------------------
>> Correctly Classified Instances : 11207 98.9406%
>> Incorrectly Classified Instances : 120 1.0594%
>> Total Classified Instances : 11327
>>
>> =======================================================
>> Confusion Matrix
>> -------------------------------------------------------
>> a b c d e f g h i j
>> k l m n o p q r s
>> t <--Classified as
>> 475 0 0 1 0 0 0 0 0 0
>> 0 0 0 0 1 0 1 0 0
>> 0 | 478 a = alt.atheism
>> 0 597 1 1 0 1 1 0 0 0
>> 0 1 0 2 1 0 0 0 0
>> 0 | 605 b = comp.graphics
>> 0 1 620 3 0 1 0 0 0 0
>> 0 1 0 0 1 0 0 0 0
>> 0 | 627 c = comp.os.ms-windows.misc
>> 1 1 1 593 2 0 0 0 0 0
>> 0 0 0 0 0 1 0 0 0
>> 0 | 599 d = comp.sys.ibm.pc.hardware
>> 0 1 1 0 568 0 1 0 0 0
>> 1 1 2 0 0 0 0 1 0
>> 0 | 576 e = comp.sys.mac.hardware
>> 0 4 2 0 0 581 0 0 0 0
>> 0 0 0 0 0 0 0 0 0
>> 0 | 587 f = comp.windows.x
>> 0 0 0 1 2 0 571 3 0 0
>> 1 1 4 1 0 0 0 0 0
>> 0 | 584 g = misc.forsale
>> 0 0 0 1 0 0 0 589 1 0
>> 0 1 1 0 0 0 0 0 0
>> 0 | 593 h = rec.autos
>> 0 0 0 0 0 0 0 1 565 0
>> 0 0 0 0 1 0 0 0 0
>> 0 | 567 i = rec.motorcycles
>> 0 0 0 0 0 0 0 0 0
>> 600 2 0 0 0 1 0 0 0 0
>> 0 | 603 j = rec.sport.baseball
>> 0 0 0 0 0 0 0 0 0 1
>> 584 0 0 0 0 0 0 0 0
>> 0 | 585 k = rec.sport.hockey
>> 0 0 0 0 0 0 0 0 0 0
>> 0 579 0 0 0 0 0 1 0
>> 0 | 580 l = sci.crypt
>> 0 0 0 1 3 0 2 0 0 2
>> 0 0 567 1 2 1 0 0 0
>> 0 | 579 m = sci.electronics
>> 0 0 0 0 0 0 0 0 0 0
>> 0 0 1 605 0 0 0 0 0
>> 0 | 606 n = sci.med
>> 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 602 0 0 0 0
>> 0 | 602 o = sci.space
>> 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 1 0 602 0 0 1
>> 0 | 604 p = soc.religion.christian
>> 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 556 0 0
>> 0 | 556 q = talk.politics.mideast
>> 0 0 1 0 0 0 0 0 0 0
>> 0 1 0 0 1 0 0 568 0
>> 0 | 571 r = talk.politics.guns
>> 11 0 0 0 0 0 0 0 0 1
>> 0 0 0 1 3 8 1 4 338
>> 2 | 369 s = talk.religion.misc
>> 0 0 0 0 0 0 0 0 0 0
>> 1 0 0 0 1 0 3 4 0
>> 447 | 456 t = talk.politics.misc
>>
>> =======================================================
>> Statistics
>> -------------------------------------------------------
>> Kappa 0.9806
>> Accuracy 98.9406%
>> Reliability 94.0932%
>> Reliability (standard deviation) 0.2163
>>
>> Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Program took 15870 ms (Minutes: 0.2645)
>> + echo 'Testing on holdout set'
>> Testing on holdout set
>> + ./bin/mahout testnb -i /tmp/mahout-work-ec2-user/20news-test-vectors -m
>> /tmp/mahout-work-ec2-user/model -l /tmp/mahout-work-ec2-user/labelindex -ow
>> -o /tmp/mahout-work-ec2-user/20news-testing -c
>>
>> [snip]
>>
>> INFO: Complementary Results:
>> =======================================================
>> Summary
>> -------------------------------------------------------
>> Correctly Classified Instances : 6715 89.3071%
>> Incorrectly Classified Instances : 804 10.6929%
>> Total Classified Instances : 7519
>>
>> =======================================================
>> Confusion Matrix
>> -------------------------------------------------------
>> a b c d e f g h i j
>> k l m n o p q r s
>> t <--Classified as
>> 298 0 0 0 0 0 0 0 0 1
>> 0 0 0 1 2 5 1 0 13
>> 0 | 321 a = alt.atheism
>> 0 298 11 6 1 12 2 2 1 1
>> 3 8 3 4 2 4 1 4 4
>> 1 | 368 b = comp.graphics
>> 1 17 286 16 4 9 6 3 2 0
>> 1 0 1 7 1 0 2 1 0
>> 1 | 358 c = comp.os.ms-windows.misc
>> 2 6 11 309 9 5 14 8 1 0
>> 2 0 6 4 2 0 1 2 1
>> 0 | 383 d = comp.sys.ibm.pc.hardware
>> 0 10 8 7 334 7 5 5 2 0
>> 3 0 2 1 1 0 1 1 0
>> 0 | 387 e = comp.sys.mac.hardware
>> 1 13 7 8 2 355 2 0 2 0
>> 0 5 1 1 3 0 0 1 0
>> 0 | 401 f = comp.windows.x
>> 0 7 11 29 12 9 268 16 8 4
>> 3 2 6 4 2 1 3 1 2
>> 3 | 391 g = misc.forsale
>> 0 1 0 0 3 0 7 362 8 2
>> 2 1 2 0 2 0 1 2 0
>> 4 | 397 h = rec.autos
>> 0 0 0 1 0 0 1 0 423 0
>> 0 0 2 1 0 1 0 0 0
>> 0 | 429 i = rec.motorcycles
>> 0 0 1 0 0 0 0 2 2
>> 371 8 0 2 3 0 2 0 0 0
>> 0 | 391 j = rec.sport.baseball
>> 0 0 1 0 0 0 1 0 0 2
>> 409 0 0 0 0 0 0 0 0
>> 1 | 414 k = rec.sport.hockey
>> 0 0 1 2 1 0 1 0 0 0
>> 0 404 0 0 0 0 0 1 0
>> 1 | 411 l = sci.crypt
>> 0 5 4 11 1 3 7 9 2 5
>> 3 3 339 2 6 0 1 1 2
>> 1 | 405 m = sci.electronics
>> 0 4 0 1 0 0 0 1 0 1
>> 1 0 3 367 3 1 2 0 0
>> 0 | 384 n = sci.med
>> 0 1 2 0 1 0 2 0 0 1
>> 0 0 1 1 375 0 1 0 0
>> 0 | 385 o = sci.space
>> 4 2 1 1 0 0 1 1 2 0
>> 0 1 1 5 1 367 4 0 1
>> 1 | 393 p = soc.religion.christian
>> 0 1 0 0 0 0 0 0 0 2
>> 0 0 0 0 0 2 378 0 1
>> 0 | 384 q = talk.politics.mideast
>> 0 0 0 0 0 2 1 1 1 1
>> 0 3 0 3 0 0 2 319 2
>> 4 | 339 r = talk.politics.guns
>> 32 0 0 1 0 0 0 0 0 1
>> 1 1 0 2 2 26 5 7 175
>> 6 | 259 s = talk.religion.misc
>> 0 0 0 2 0 0 0 0 0 1
>> 2 2 0 1 2 1 10 18 2
>> 278 | 319 t = talk.politics.misc
>>
>> =======================================================
>> Statistics
>> -------------------------------------------------------
>> Kappa 0.8594
>> Accuracy 89.3071%
>> Reliability 84.611%
>> Reliability (standard deviation) 0.2148
>>
>> Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Program took 10840 ms (Minutes: 0.18066666666666667)
>>
>>
>> *Naive bayes:*
>> INFO: Standard NB Results:
>> =======================================================
>> Summary
>> -------------------------------------------------------
>> Correctly Classified Instances : 11286 99.0869%
>> Incorrectly Classified Instances : 104 0.9131%
>> Total Classified Instances : 11390
>>
>> =======================================================
>> Confusion Matrix
>> -------------------------------------------------------
>> a b c d e f g h i j
>> k l m n o p q r s
>> t <--Classified as
>> 474 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 2
>> 1 | 477 a = alt.atheism
>> 0 566 0 2 0 1 0 0 0 0
>> 0 0 0 0 0 0 0 0 0
>> 0 | 569 b = comp.graphics
>> 0 10 590 29 2 4 1 0 0 0
>> 0 0 1 0 0 0 0 0 0
>> 1 | 638 c = comp.os.ms-windows.misc
>> 0 0 0 596 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0
>> 0 | 596 d = comp.sys.ibm.pc.hardware
>> 0 0 0 0 575 0 1 0 0 0
>> 0 0 1 0 0 0 0 0 0
>> 0 | 577 e = comp.sys.mac.hardware
>> 0 2 2 2 0 593 1 0 0 0
>> 0 0 0 0 1 0 0 0 0
>> 0 | 601 f = comp.windows.x
>> 0 0 0 1 0 0 589 1 0 0
>> 1 0 2 0 0 0 0 0 0
>> 0 | 594 g = misc.forsale
>> 0 0 0 0 0 0 0 594 0 0
>> 0 0 0 0 0 0 0 0 0
>> 0 | 594 h = rec.autos
>> 0 0 0 0 0 0 0 0 611 0
>> 0 0 0 0 0 0 0 0 0
>> 0 | 611 i = rec.motorcycles
>> 0 0 0 0 0 0 0 0 0
>> 616 1 0 0 0 0 0 0 0 0
>> 0 | 617 j = rec.sport.baseball
>> 0 0 0 0 0 0 1 0 0 0
>> 620 0 0 0 0 0 0 0 0
>> 0 | 621 k = rec.sport.hockey
>> 0 0 0 0 0 0 0 0 0 0
>> 0 580 0 0 0 0 0 1 0
>> 0 | 581 l = sci.crypt
>> 0 0 0 3 1 0 0 0 0 0
>> 0 0 571 0 0 0 0 0 0
>> 0 | 575 m = sci.electronics
>> 0 0 0 0 0 0 0 0 0 0
>> 0 0 2 583 0 0 0 0 0
>> 0 | 585 n = sci.med
>> 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 1 599 0 0 0 0
>> 0 | 600 o = sci.space
>> 0 1 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 615 0 0 0
>> 0 | 616 p = soc.religion.christian
>> 1 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 1 560 0 0
>> 0 | 562 q = talk.politics.mideast
>> 0 0 1 0 0 0 0 0 0 0
>> 0 1 0 0 0 0 0 548 0
>> 1 | 551 r = talk.politics.guns
>> 10 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 1 1 0 2 344
>> 1 | 359 s = talk.religion.misc
>> 0 0 0 0 0 0 0 0 0 0
>> 0 1 1 0 0 0 0 2 0
>> 462 | 466 t = talk.politics.misc
>>
>> =======================================================
>> Statistics
>> -------------------------------------------------------
>> Kappa 0.9847
>> Accuracy 99.0869%
>> Reliability 94.3334%
>> Reliability (standard deviation) 0.2169
>>
>> Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Program took 14304 ms (Minutes: 0.2384)
>> + echo 'Testing on holdout set'
>> Testing on holdout set
>>
>> [snip]
>>
>> INFO: Standard NB Results:
>> =======================================================
>> Summary
>> -------------------------------------------------------
>> Correctly Classified Instances : 6718 90.1019%
>> Incorrectly Classified Instances : 738 9.8981%
>> Total Classified Instances : 7456
>>
>> =======================================================
>> Confusion Matrix
>> -------------------------------------------------------
>> a b c d e f g h i j
>> k l m n o p q r s
>> t <--Classified as
>> 294 0 0 0 0 0 0 0 0 0
>> 0 2 0 1 1 6 1 1 16
>> 0 | 322 a = alt.atheism
>> 0 345 6 14 6 11 6 0 0 0
>> 0 5 7 1 3 0 0 0 0
>> 0 | 404 b = comp.graphics
>> 2 29 177 78 22 19 9 1 0 0
>> 0 4 2 0 1 1 0 0 1
>> 1 | 347 c = comp.os.ms-windows.misc
>> 1 9 2 335 18 2 10 0 0 0
>> 1 0 8 0 0 0 0 0 0
>> 0 | 386 d = comp.sys.ibm.pc.hardware
>> 1 4 2 13 347 3 5 1 0 0
>> 1 0 7 1 0 0 0 1 0
>> 0 | 386 e = comp.sys.mac.hardware
>> 0 20 0 4 0 352 4 0 0 0
>> 0 0 1 1 3 0 1 0 1
>> 0 | 387 f = comp.windows.x
>> 0 2 0 21 5 1 323 7 2 2
>> 0 2 12 0 3 0 0 0 0
>> 1 | 381 g = misc.forsale
>> 0 1 0 0 1 0 15 363 8 1
>> 0 0 4 1 0 0 0 1 0
>> 1 | 396 h = rec.autos
>> 0 1 0 0 0 0 6 6 370 0
>> 0 0 0 1 0 0 0 0 1
>> 0 | 385 i = rec.motorcycles
>> 1 0 0 1 1 0 2 1 2
>> 362 5 0 2 0 0 0 0 0 0
>> 0 | 377 j = rec.sport.baseball
>> 0 0 0 1 2 0 0 0 0 3
>> 371 0 0 0 0 0 0 0 0
>> 1 | 378 k = rec.sport.hockey
>> 0 3 1 0 1 0 2 0 0 0
>> 0 396 0 1 0 0 1 1 1
>> 3 | 410 l = sci.crypt
>> 0 7 0 7 7 2 6 4 0 0
>> 0 1 369 2 2 0 0 0 0
>> 2 | 409 m = sci.electronics
>> 0 3 0 2 1 0 2 0 0 0
>> 0 1 4 383 4 0 0 1 0
>> 4 | 405 n = sci.med
>> 0 5 0 0 1 0 3 0 0 0
>> 0 0 1 0 374 1 0 0 1
>> 1 | 387 o = sci.space
>> 6 2 0 1 1 0 0 1 0 1
>> 0 0 1 5 0 352 2 1 7
>> 1 | 381 p = soc.religion.christian
>> 1 1 0 0 0 0 0 0 0 0
>> 1 0 0 0 0 0 373 1 0
>> 1 | 378 q = talk.politics.mideast
>> 0 0 0 0 0 0 1 0 1 0
>> 0 2 0 0 0 0 0 346 2
>> 7 | 359 r = talk.politics.guns
>> 26 1 0 1 0 0 0 2 0 1
>> 1 0 0 1 1 20 2 6 200
>> 7 | 269 s = talk.religion.misc
>> 1 0 0 0 0 0 0 2 0 0
>> 1 0 0 2 2 0 1 14 0
>> 286 | 309 t = talk.politics.misc
>>
>> =======================================================
>> Statistics
>> -------------------------------------------------------
>> Kappa 0.8726
>> Accuracy 90.1019%
>> Reliability 85.4491%
>> Reliability (standard deviation) 0.2222
>>
>> Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Program took 10878 ms (Minutes: 0.1813)
>>
>> *SGD:*
>> 7532 test files
>>
>> =======================================================
>> Summary
>> -------------------------------------------------------
>> Correctly Classified Instances : 5649 75%
>> Incorrectly Classified Instances : 1883 25%
>> Total Classified Instances : 7532
>>
>> =======================================================
>> Confusion Matrix
>> -------------------------------------------------------
>> a b c d e f g h i j
>> k l m n o p q r s
>> t <--Classified as
>> 186 6 3 10 5 0 33 4 13
>> 15 7 1 24 15 3 15 5 5 29
>> 15 | 394 a = sci.space
>> 5 309 0 3 2 5 0 0 0 1
>> 9 21 2 0 0 18 4 4 1
>> 1 | 385 b = comp.sys.mac.hardware
>> 4 1 101 3 0 1 63 0 7 0
>> 1 1 5 16 3 0 3 7 1
>> 34 | 251 c = talk.religion.misc
>> 11 12 1 265 1 10 3 0 0
>> 17 10 11 5 2 0 11 3 6 21
>> 0 | 389 d = comp.graphics
>> 2 1 1 0 349 2 3 0 3 2
>> 6 1 5 1 0 2 15 2 1
>> 2 | 398 e = rec.motorcycles
>> 7 20 3 19 2 254 6 0 2
>> 11 2 39 7 2 0 4 2 2 9
>> 3 | 394 f = comp.os.ms-windows.misc
>> 2 1 13 0 0 0 247 0 1 1
>> 3 0 6 2 4 0 2 3 5
>> 29 | 319 g = alt.atheism
>> 1 1 0 0 2 0 2 361 0 1
>> 2 0 2 0 0 1 3 22 0
>> 1 | 399 h = rec.sport.hockey
>> 3 0 3 1 0 0 5 0 161 0
>> 1 2 12 102 0 0 1 2 11
>> 6 | 310 i = talk.politics.misc
>> 2 8 0 19 0 19 0 0 1
>> 294 10 11 4 2 0 5 0 3 11
>> 6 | 395 j = comp.windows.x
>> 2 10 0 1 1 0 0 0 0 1
>> 347 13 2 1 0 5 3 2 2
>> 0 | 390 k = misc.forsale
>> 1 36 0 6 1 25 0 0 1 6
>> 10 257 2 1 0 34 6 0 6
>> 0 | 392 l = comp.sys.ibm.pc.hardware
>> 2 2 2 2 1 0 12 0 0 6
>> 10 4 312 5 2 13 11 3 3
>> 6 | 396 m = sci.med
>> 2 0 3 2 1 0 0 1 13 0
>> 5 1 2 314 2 0 2 2 10
>> 4 | 364 n = talk.politics.guns
>> 1 0 2 1 1 0 34 1 33 1
>> 3 0 1 8 271 1 4 5 6
>> 3 | 376 o = talk.politics.mideast
>> 3 14 0 8 2 8 3 1 1 7
>> 12 29 6 2 1 245 13 2 32
>> 4 | 393 p = sci.electronics
>> 3 3 0 2 11 0 1 0 2 1
>> 11 6 4 2 0 11 330 4 4
>> 1 | 396 q = rec.autos
>> 0 0 1 0 1 0 4 12 3 1
>> 3 0 0 0 0 5 6 359 1
>> 1 | 397 r = rec.sport.baseball
>> 0 1 0 0 0 1 0 0 3 3
>> 0 0 3 2 1 6 1 6 366
>> 3 | 396 s = sci.crypt
>> 0 2 11 1 1 0 40 0 1 2
>> 3 4 2 1 0 5 0 2 2
>> 321 | 398 t = soc.religion.christian
>>
>> =======================================================
>> Statistics
>> -------------------------------------------------------
>> Kappa 0.7073
>> Accuracy 75%
>> Reliability 70.6238%
>> Reliability (standard deviation) 0.2187
>> Log-likelihood mean : -1.1182
>> 25%-ile : -1.6911
>> 75%-ile : -0.0803
>>
>> Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Program took 10783 ms (Minutes: 0.17971666666666666)
>>
>>
>>
>>
>> On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <su...@yahoo.com>wrote:
>>
>>> Thanks Andrew for reporting that. I rolled back the release to fix this
>>> and few other issues.
>>>
>>> We have removed asf-examples*.sh from trunk as the sample file at the
>>> url mentioned in ur email is not available.
>>> This is something we need to fix and restore in 1.0.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <ap...@outlook.com>
>>> wrote:
>>>
>>> from the asf-email-examples.sh script:
>>>
>>> # You will need to download or otherwise obtain some or all of the
>>> Amazon ASF Em
>>> ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566) to
>>> use this
>>> script.
>>> # To obtain a full copy you will need to launch an EC2 instance and
>>> mount the da
>>> taset to download it, otherwise you can get a sample of it at
>>> #
>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>>>
>>> It looks like the:
>>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>>>
>>> link is down.
>>>
>>> Is there somewhere else that we can get a subset of the ASF emails?
>>>
>>>
>>>
>>> Date: Tue, 21 Jan 2014 09:48:06 -0800
>>> > Subject: Re: MAHOUT 0.9 Release - New URL
>>> > From: andrew.musselman@gmail.com
>>> > To: dev@mahout.apache.org
>>> >
>>> > Sure thing; continuing to smoke test the other examples tonight
>>> >
>>> >
>>> > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <
>>> suneel_marthi@yahoo.com>wrote:
>>> >
>>> > > Thanks Andrew M., see that some of the example scripts need to be
>>> fixed as
>>> > > they still refer to the deprecated algorithms.
>>> > > See that the Streaming KMeans has failed for you as well.
>>> > >
>>> > > I'll be rolling back the release today to fix these issues.
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
>>> > > andrew.musselman@gmail.com> wrote:
>>> > >
>>> > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default
>>> 64-bit
>>> > > Linux AMI from tarball.
>>> > >
>>> > > All tests pass.
>>> > >
>>> > > *Output of examples:*
>>> > > *asf-email-examples.sh, run on mahout.apache.org
>>> > > <http://mahout.apache.org>:*
>>> > > *recommendations:*
>>> > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
>>> > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000 | less
>>> > > 1
>>> > >
>>> > >
>>> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
>>> > > 4
>>> > >
>>> > >
>>> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
>>> > > 6
>>> > >
>>> > >
>>> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
>>> > > 8
>>> > > [12758:1.0,19409:1.0,11112:1.0]
>>> > > 11
>>> > >
>>> > >
>>> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
>>> > > 14
>>> > >
>>> > >
>>> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
>>> > > 15
>>> > >
>>> > >
>>> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
>>> > > 16
>>> > >
>>> > >
>>> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
>>> > > 18
>>> > >
>>> > >
>>> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
>>> > > 19 [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
>>> > > 20
>>> > >
>>> > >
>>> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
>>> > > [snip]
>>> > >
>>> > > *clustering; kmeans:*
>>> > > [snip]
>>> > > Weight : [props - optional]: Point:
>>> > > 1.0 :
>>> > > [distance-squared=1.0193102046188427]:
>>> > > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus =
>>> > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
>>> 7573:0.204,
>>> > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093,
>>> 9779:0.159,
>>> > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
>>> > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
>>> > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
>>> > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
>>> > > 39789:0.110, 40743:0.190, 45775:0.086]
>>> > > 1.0 : [distance-squared=0.9823018320457279]:
>>> > > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus=
>>> > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
>>> 5336:0.106,
>>> > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173,
>>> 7832:0.072,
>>> > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
>>> > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
>>> > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
>>> > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
>>> > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
>>> > > 1.0 : [distance-squared=0.9509142993214911]:
>>> > > /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor=
>>> > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
>>> > > 4419:0.076,
>>> > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056,
>>> 7235:0.048,
>>> > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123,
>>> 7683:0.077,
>>> > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
>>> 10225:0.081,
>>> > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
>>> > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
>>> > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
>>> > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
>>> > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
>>> > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
>>> > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
>>> > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
>>> > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
>>> > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
>>> > > 41280:0.065, 41696:0.072, 41947:0.118,
>>> > > 43685:0.086, 44077:0.308,
>>> > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
>>> > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
>>> > > [snip]
>>> > >
>>> > > *clustering; dirichlet:*
>>> > > Get this complaint:
>>> > > Running Dirichlet with K = 8
>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
>>> > > HADOOP_CONF_DIR=
>>> > > MAHOUT-JOB:
>>> > >
>>> > >
>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class:
>>> dirichlet
>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props found
>>> on
>>> > > classpath, will use command-line arguments only
>>> > > Unknown program 'dirichlet' chosen.
>>> > >
>>> > > *clustering: minhash:*
>>> > > Running Minhash
>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
>>> > > HADOOP_CONF_DIR=
>>> > > MAHOUT-JOB:
>>> > >
>>> > >
>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>>> > > 14/01/21 05:17:27 WARN
>>> > > driver.MahoutDriver: Unable to add class: minhash
>>> > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found on
>>> > > classpath, will use command-line arguments only
>>> > > Unknown program 'minhash' chosen.
>>> > >
>>> > > *classification; standard:*
>>> > > =======================================================
>>> > > Summary
>>> > > -------------------------------------------------------
>>> > > Correctly Classified Instances : 5384 87.7874%
>>> > > Incorrectly Classified Instances : 749 12.2126%
>>> > > Total Classified Instances : 6133
>>> > >
>>> > > =======================================================
>>> > > Confusion Matrix
>>> > > -------------------------------------------------------
>>> > > a b c d
>>> > > <--Classified as
>>> > > 2949 7 531 25 | 3512 a = dev
>>> > > 0 0 0 0 | 0 b = general
>>> > > 99 8 1763 8 | 1878 c = user
>>> > > 41 1 29 672 | 743 d = commits
>>> > >
>>> > > =======================================================
>>> > > Statistics
>>> > > -------------------------------------------------------
>>> > > Kappa
>>> > > 0.7877
>>> > > Accuracy 87.7874%
>>> > > Reliability 53.658%
>>> > > Reliability (standard deviation) 0.4911
>>> > >
>>> > > *classification; complementary:*
>>> > > =======================================================
>>> > > Summary
>>> > > -------------------------------------------------------
>>> > > Correctly Classified Instances : 5530 90.1679%
>>> > > Incorrectly Classified Instances : 603 9.8321%
>>> > > Total Classified Instances :
>>> > > 6133
>>> > >
>>> > > =======================================================
>>> > > Confusion Matrix
>>> > > -------------------------------------------------------
>>> > > a b c d <--Classified as
>>> > > 3168 0 276 68 | 3512 a = dev
>>> > > 0 0 0 0 | 0 b = general
>>> > > 196 0 1652 30 | 1878 c = user
>>> > > 25 0 8 710 | 743 d =
>>> > > commits
>>> > >
>>> > > =======================================================
>>> > > Statistics
>>> > > -------------------------------------------------------
>>> > > Kappa 0.8259
>>> > > Accuracy 90.1679%
>>> > > Reliability 54.7459%
>>> > > Reliability (standard deviation) 0.5005
>>> > >
>>> > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms
>>> (Minutes:
>>> > > 0.34836666666666666)
>>> > >
>>> > > *classification; sgd, with three categories:*
>>> > > Running SGD Training
>>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
>>> > > and
>>> > > HADOOP_CONF_DIR=
>>> > > MAHOUT-JOB:
>>> > >
>>> > >
>>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>>> > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
>>> > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
>>> classpath,
>>> > > will use command-line arguments only
>>> > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
>>> > > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
>>> > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
>>> > > --output=[asf-output/classification/sgd/models], --poolSize=[5],
>>> > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
>>> > > 24168 training files
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 1
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
>>> > > 2
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 3
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 4
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 6
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 8
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 10
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00
>>> > > 0.00 0.00 0.0000000 0.0000000 12
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 15
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 20
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 25
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 30
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000
>>> > > 0.0000000 40
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 50
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 60
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 70
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 80
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 100
>>> > > 0.000
>>> > > 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 120
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 140
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 150
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 200
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 250
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00
>>> > > 0.00 0.00 0.0000000 0.0000000 300
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 400
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 500
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 600
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 700
>>> > > 0.000 0.00 none
>>> > > 0.00 0.00 0.00 0.00 0.0000000
>>> > > 0.0000000 800
>>> > > 0.000 0.00 none
>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
>>> > > 1.0019413e-08 1000 -0.607 75.78 none
>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
>>> > > 1.0019413e-08 1200 -0.607 75.78 none
>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
>>> > > 1.0019413e-08 1400 -0.607 75.78 none
>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
>>> > > 1.0019413e-08 1500 -0.607 75.78 none
>>> > > 0.24 43686.00 17924.00 329.50
>>> > > 1.0571799e-08
>>> > > 1.0032261e-08 2000 -0.487 82.65 none
>>> > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
>>> > > 1.0011902e-08 2500 -0.439 83.90 none
>>> > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
>>> > > 1.0011902e-08 3000 -0.439 83.90 none
>>> > > 0.32 50635.00 28531.00 437.09 1.0551175e-08
>>> > > 1.0000001e-08 4000 -0.351 88.14 none
>>> > > 0.32 50635.00 32642.00 437.09 1.0551175e-08
>>> > > 1.0000000e-08 5000 -0.378 87.10 none
>>> > > 0.32 50635.00 36461.00 437.09
>>> > > 1.0556652e-08
>>> > > 1.0000001e-08 6000 -0.372 86.89 none
>>> > > 0.32 50635.00 37768.00 437.09 1.0576742e-08
>>> > > 1.0000001e-08 7000 -0.334 89.26 none
>>> > > 0.32 50635.00 38807.00 437.09 1.0576742e-08
>>> > > 1.0000000e-08 8000 -0.368 87.52 none
>>> > > 0.32 50635.00 44731.00 437.09 1.0576716e-08
>>> > > 1.0000000e-08 10000 -0.374 87.39 none
>>> > > 0.32 50635.00 45672.00 437.09 1.0576716e-08
>>> > > 1.0000000e-08 12000 -0.298 88.26 none
>>> > > Exception in thread "main" java.lang.IllegalStateException:
>>> > > java.lang.ArrayIndexOutOfBoundsException:
>>> > > 2
>>> > > at
>>> > >
>>> > >
>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
>>> > > at
>>> > >
>>> > >
>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
>>> > > at
>>> > >
>>> > >
>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
>>> > > at
>>> > >
>>> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
>>> > > at
>>> > >
>>> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
>>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>> Method)
>>> > > at
>>> > >
>>> > >
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> > >
>>> > > at
>>> > >
>>> > >
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> > > at java.lang.reflect.Method.invoke(Method.java:622)
>>> > > at
>>> > >
>>> > >
>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>> > > at
>>> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>> > > at
>>> > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>> Method)
>>> > > at
>>> > >
>>> > >
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> > > at
>>> > >
>>> > >
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> > > at java.lang.reflect.Method.invoke(Method.java:622)
>>> > > at
>>> > > org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>>> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
>>> > > at
>>> > > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
>>> > > at
>>> > >
>>> > >
>>> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
>>> > > at
>>> > >
>>> > >
>>> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
>>> > > at
>>> > >
>>> > >
>>> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
>>> > > at
>>> > >
>>> > >
>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
>>> > > at
>>> > >
>>> > >
>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
>>> > >
>>> > > at
>>> > >
>>> > >
>>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
>>> > > at
>>> > >
>>> > >
>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
>>> > > at
>>> > >
>>> > >
>>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
>>> > > at
>>> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>> > > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>> > > at
>>> > >
>>> > >
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>>> > > at
>>> > >
>>> > >
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> > > at java.lang.Thread.run(Thread.java:701)
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
>>> > > andrew.musselman@gmail.com> wrote:
>>> > >
>>> > > > Trying out the build today
>>> > > >
>>> > > >
>>> > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
>>> suneel_marthi@yahoo.com
>>> > > >wrote:
>>> > > >
>>> > > >> This is an issue (trivial one though) that needs to be fixed for
>>> 0.9
>>> > > >> Release, will be rerolling the release today (in the next few
>>> hrs) and
>>> > > >> putting out a new release candidate in staging.
>>> > > >>
>>> > > >> Thanks for reporting this Andrew P.
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
>>> > > ap.dev@outlook.com>
>>> > > >> wrote:
>>> > > >>
>>> > > >> I ran through the tests with on a CentOS VM
>>> > > AMD64 2 cores 4 GB RAM. Had
>>> > > >> a bit of trouble getting the Hadoop natives to compile and
>>> therefore may
>>> > > >> have run into some problems because of the hadoop setup. Ran
>>> into some
>>> > > >> problems in the example scripts. Particularly with
>>> > > >> ./cluster-syntheticcontrol.sh ->4,5. I will run through the rest
>>> of the
>>> > > >> examples when im sure I've got hadoop setup right.
>>> > > >>
>>> > > >>
>>> > > >> Apache Maven 3.1.2-SNAPSHOT
>>> > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
>>> > > >> Java home: /usr/java/jdk1.6.0_45/jre
>>> > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch:
>>> "amd64",
>>> > > >> family: "unix"
>>> > > >> $MAHOUT_LOCAL=true
>>> > > >> Hadoop 2.2.0
>>> > > >>
>>> > > >>
>>> > > >> a) Verify that u can unpack the release (tar or zip) ...passed
>>> (tar)
>>> > > >> [passed ]
>>> > > >>
>>> > > >> b) Verify u r able to compile the
>>> > > distro
>>> > > >>
>>> > > >> mvn compile- [passed with warnings]
>>> > > >>
>>> > > >> [WARNING] Expected all dependencies to require Scala
>>> version: 2.9.3
>>> > > >> [WARNING] org.apache.mahout:mahout-math-scala:0.9 requires
>>> scala
>>> > > >> version: 2.9.3
>>> > > >> [WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala
>>> > > >> version: 2.9.2
>>> > > >> [WARNING] Multiple versions of scala libraries detected!
>>> > > >>
>>> > > >> c) Run through the unit tests: mvn clean test
>>> > > >> mvn clean test [passed]
>>> > > >>
>>> > > >> d) Run the
>>> > > >> example scripts under $MAHOUT_HOME/examples/bin.
>>> > > >> Please run through all the different options in each script
>>> > > >>
>>> > > >> Running example scripts with $MAHOUT_LOCAL=true
>>> > > >>
>>> > > >>
>>> > > ./cluster-syntheticcontrol.sh ->1 [works]
>>> > > >> ./cluster-syntheticcontrol.sh ->2 [works]
>>> > > >> ./cluster-syntheticcontrol.sh ->3 [works]
>>> > > >>
>>> > > >>
>>> > > >> ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
>>> > > >> [...]
>>> > > >> WARNING: Unable to add class:
>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>>> > > >> java.lang.ClassNotFoundException:
>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>>> > > >> at
>>> > > >> java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>> > > >> at java.security.AccessController.doPrivileged(Native
>>> Method)
>>> > > >> at
>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>> > > >> at
>>> > > java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>> > > >> at
>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>> > > >> at java.lang.Class.forName0(Native Method)
>>> > > >> at java.lang.Class.forName(Class.java:171)
>>> > > >> at
>>> > > >>
>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>>> > > >> at
>>> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>>> > > >> Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
>>> > > >>
>>> > > >>
>>> > > >> ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
>>> > > >>
>>> > > >> WARNING: Unable to add class:
>>> > > >>
>>> > > org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>>> > > >> java.lang.ClassNotFoundException:
>>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>>> > > >> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>> > > >> at java.security.AccessController.doPrivileged(Native
>>> Method)
>>> > > >> at
>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>> > > >> at
>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>> > > >> at java.lang.Class.forName0(Native Method)
>>> > > >> at
>>> > > java.lang.Class.forName(Class.java:171)
>>> > > >> at
>>> > > >>
>>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>>> > > >> at
>>> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>>> > > >> Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
>>> > > >> WARNING: No
>>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props
>>> found
>>> > > on
>>> > > >> classpath, will use command-line arguments only
>>> > > >> Unknown program
>>> > > >> 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
>>> chosen.
>>> > > >>
>>> > > >>
>>> > > >> ./classify-20newsgroups.sh ->1 [works]
>>> > > >> ./classify-20newsgroups.sh ->2 [works]
>>> > > >>
>>> > > >>
>>> > > >> cluster-reuters.sh ->1 [works]
>>> > > >>
>>> > > cluster-reuters.sh ->2 [works]
>>> > > >> cluster-reuters.sh ->3 [works]
>>> > > >>
>>> > > >> Same error as noted previosly in the thread:
>>> > > >>
>>> > > >> cluster-reuters.sh ->4 [0 clusters]
>>> > > >>
>>> > > >> [...]
>>> > > >>
>>> > > >> WARNING: No qualcluster.props found on classpath, will use
>>> > > >> command-line arguments only
>>> > > >> Num clusters: 0; maxDistance: 0.000000
>>> > > >> [Dunn Index]
>>> > > >> First: Infinity
>>> > > >> [Davies-Bouldin Index] First: NaN
>>> > > >> Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
>>> > > >> INFO: Program took 669 ms (Minutes: 0.01115)
>>> > > >> cluster,distance.mean,distance.sd
>>> > > >>
>>> > >
>>> > >
>>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
>>> > > >> > From: suneel_marthi@yahoo.com
>>> > > >> > Subject: MAHOUT 0.9 Release - New URL
>>> > > >> > To: user@mahout.apache.org; dev@mahout.apache.org
>>> > > >> >
>>> > > >> > Third time's a Charm!!!
>>> > > >> >
>>> > > >> >
>>> > > >> > Here's the new URL for Mahout 0.9 Release:
>>> > > >> >
>>> > > >>
>>> > >
>>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>>> > > >> >
>>> > > >> > For those volunteering to test this, some of the things to be
>>> > > verified:
>>> > > >> >
>>> > > >> > a) Verify that u can unpack the release (tar or zip)
>>> > > >> > b) Verify u r able to compile the distro
>>> > > >> > c) Run through the unit tests: mvn clean test
>>> > > >> > d) Run the example scripts
>>> > > >> under $MAHOUT_HOME/examples/bin. Please run through all the
>>> different
>>> > > >> options in each script.
>>> > > >> >
>>> > > >> >
>>> > > >> > Committers
>>> > > >> > and PMC members:
>>> > > >> > ---------------------------------------
>>> > > >> >
>>> > > >> > Need 'at least 3 +1 votes' for the Release to pass.
>>> > > >> >
>>> > > >> >
>>> > > >> > Thanks and
>>> > > Regards.
>>> > > >>
>>> > > >
>>> > > >
>>> > >
>>>
>>
>>
>
Re: MAHOUT 0.9 Release - New URL
Posted by Andrew Musselman <an...@gmail.com>.
*cluster-reuters.sh*
*k-means:*
[snip]
:VL-19482{n=913 c=[0.06:0.011, 0.1:0.007, 0.13:0.010, 0.25:0.016,
0.38:0.020, 0.4:0.007, 0.5:0.032, 0
Top Terms:
banks =>
3.841823268955143
bank =>
3.80633066361209
debt =>
3.28065219870794
said =>
2.5965700942088583
he =>
2.335682813857497
foreign =>
2.2217853688201403
billion =>
2.1970193848291335
would =>
1.9932392063955617
loans =>
1.9309276792854233
interest =>
1.787324501938
have =>
1.762981951432578
its =>
1.7615109954971866
which =>
1.5822081148036862
has =>
1.5600708189041956
dlrs =>
1.5571038313005996
finance =>
1.5539758811252924
new =>
1.5176015811577555
had =>
1.5138723701401844
brazil =>
1.5083369853593172
payments =>
1.4539044255886517
Weight : [props - optional]: Point:
:VL-7320{n=2726 c=[0:0.003, 0.1:0.010, 0.2:0.007, 0.3:0.009, 0.4:0.007,
0.40:0.003, 0.5:0.009, 0.57:0
Top Terms:
vs =>
6.126130791333171
net =>
4.012191567277523
cts =>
3.822006848832744
shr =>
3.6786004856764527
mln =>
2.9011643584038698
loss =>
2.788368861463607
qtr =>
2.714140225051522
revs =>
2.4739861236454717
profit =>
1.8146888090247015
note =>
1.7977163272138388
dlrs =>
1.6164390808155846
avg =>
1.3901765773336587
shrs =>
1.3856326531419314
mths =>
1.3168717272038506
4th =>
1.2161158425617289
oper =>
1.182419473776814
year =>
1.178086061733047
nine =>
1.0670554836445316
3rd =>
1.041334410056592
inc =>
1.0019361981554935
Weight : [props - optional]: Point:
Inter-Cluster Density: 0.45562152681859414
Intra-Cluster Density: 0.6952712632167628
CDbw Inter-Cluster Density: 0.0
CDbw Intra-Cluster Density: 16.486930227598684
CDbw Separation: 194.49005884464628
*fuzzy k-means:*
:SV-18539{n=1039 c=[0:0.026, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
0.01:0.005, 0.02:0.002, 0.0
Top Terms:
said =>
1.8665592354713065
its =>
1.1335212213411592
pct =>
1.0862816801353348
dlrs =>
1.0854998884993752
mln =>
1.043163996400643
from =>
0.9684961110525736
has =>
0.912161511978058
company =>
0.8754186972808333
mar =>
0.8675333452422878
inc =>
0.7678617590362815
would =>
0.7610968883652675
he =>
0.7459988770503974
which =>
0.7435613119406804
year =>
0.7302840632748394
u.s =>
0.7281061062439116
shares =>
0.7260764102983083
corp =>
0.7179807367808658
new =>
0.7044203783157115
stock =>
0.6962010978721442
have =>
0.6464265467298506
:SV-9431{n=1034 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
0.01:0.004, 0.02:0.002, 0.02
Top Terms:
said =>
1.864911184196927
dlrs =>
1.199286689822081
mln =>
1.1802134783562215
pct =>
1.1529704214798124
its =>
1.1184398851519701
from =>
1.016647848050332
company =>
0.894703604722841
mar =>
0.879986159541356
has =>
0.8642799128491316
year =>
0.8271823503717782
inc =>
0.7871293745341424
corp =>
0.737705498468879
which =>
0.722975201852743
would =>
0.708000816484415
u.s =>
0.7073294276173905
billion =>
0.7055723996916351
he =>
0.7042684217823294
new =>
0.6834737905434939
shares =>
0.6753327384172428
stock =>
0.6576225144041699
:SV-4785{n=1044 c=[0:0.023, 0.003:0.001, 0.006913:0.001, 0.007050:0.001,
0.01:0.006, 0.02:0.002, 0.02
Top Terms:
said =>
1.8796076179735086
its =>
1.172025965452378
dlrs =>
1.130422792460914
pct =>
1.082038255241358
mln =>
1.0772146872767114
company =>
0.9662235879639138
from =>
0.9473172871605616
has =>
0.9224712965830099
mar =>
0.8769325856924421
inc =>
0.8360245257169788
shares =>
0.8334595641384324
stock =>
0.7704621839612175
corp =>
0.7682400250301806
which =>
0.7389988207856137
would =>
0.7339708917389389
year =>
0.7088414843731325
new =>
0.7038109468655172
he =>
0.6993994455501005
u.s =>
0.6772649147622415
share =>
0.6241804830055171
*lda:*
[snip]
21539
{0.02:0.7071698119320297,0.01:0.09185661419250732,0.055:0.05559716236030052,0:0.04416624482186779,0.046:0.04110903741109505,0.10:0.036664417789754995,0.073:0.009543544335363714,0.006913:0.0050293341354450535,0.007050:0.004031353935388081,0.003:0.0019163134919350053}
21540
{0.04:0.4289840457590362,0.006913:0.3764552842292379,0.03:0.14201700033942147,0.025:0.031492533562460345,0.01:0.0057428959027208,0.0625:0.005563615996288134,0.007050:0.004410121345864598,0.02:0.003315679992182833,0.06:0.0010680133665222197,0.057:2.6908116668663575E-4}
21541
{0:0.6323790020346274,0.02:0.282038438551802,0.06:0.05929917592183956,0.046:0.021922159161082488,0.025:0.003123559949176154,0.1:6.163861804777446E-4,0.057:1.1513607281652563E-4,0.077:7.546641269535658E-5,0.05:6.020069105221075E-5,0.04:5.727546417602751E-5}
21542
{0.06:0.7916202902993545,0.003:0.12474538653424426,0.03:0.0516187501990535,0.073:0.013078206873952192,0.077:0.006874558859394474,0.1:0.004726257215175674,0.07:0.0029954110394770084,0.02:0.0015535221634909381,0.0625:8.057802584933225E-4,0.006913:5.349608183182145E-4}
21543
{0.03:0.8557703918728101,0.007050:0.0884696514495358,0.01:0.02386217742025169,0.055:0.01794750983209314,0.046:0.007244240662980594,0.02:0.0025021610305199344,0.1:0.0011795344808501283,0.0625:9.555718731181631E-4,0.077:8.384080940465597E-4,0.003:2.4480111447900804E-4}
21544
{0.006913:0.6497351267772595,0.03:0.2586364130498913,0.003:0.0845308091084703,0.05:0.003488921831506818,0.0625:0.0011991059579690732,0.06:5.301659535652308E-4,0.055:3.152013670552699E-4,0.025:3.11949418681945E-4,0.02:2.2346509541410544E-4,0.057:1.727741085612565E-4}
21545
{0.07:0.24937001680778675,0.03:0.22854680579732564,0.1:0.22068070811382484,0.05:0.21891358916881656,0.06:0.04971205462818302,0.046:0.029384231252419384,0.025:0.0014118858333346275,0.0625:0.0012321257282372393,0.073:3.1606402052550454E-4,0.04:1.1662985389192366E-4}
21546
{0.006913:0.652435612545713,0.073:0.24282539253890825,0.003:0.055020663101050894,0.046:0.01978650831133704,0.04:0.011699600233998459,0.02:0.006822518565048025,0.05:0.0042733514779345234,0.025:0.0020768247329981497,0.03:0.0015466188524926666,0:0.0010433788491090747}
21547
{0.1:0.935072008706917,0.077:0.040616382672055494,0.046:0.023138054104883225,0.06:3.42252302356326E-4,0.03:2.672247473681925E-4,0.007050:9.004419280457053E-5,0.006913:7.844271097106069E-5,0.04:7.198835322717653E-5,0.073:5.8017696474244885E-5,0.02:4.474316852571109E-5}
21548
{0.006913:0.4510141547325999,0.03:0.21472666934984877,0.07:0.15060570743672352,0.046:0.08216181853028293,0.05:0.07498297963542139,0.077:0.01401215532342401,0.04:0.006135722806477439,0.073:0.0031124469556872442,0.02:0.001327252389650958,0.025:4.5167404646311124E-4}
21549
{0.077:0.5249260290096315,0:0.27520186965742544,0.073:0.08959570186504386,0.057:0.05063809804337512,0.02:0.045313417578343,0.03:0.00856024297303885,0.06:0.0034039136814002697,0.07:8.611323331226122E-4,0.05:3.9387255234958607E-4,0.04:3.454752498776842E-4}
21550
{0.077:0.8319708524327014,0.007050:0.16001430652201168,0.0625:0.0024884528530674226,0.03:0.0024228964094551416,0.006913:0.0010048446080994332,0.025:5.974607168723063E-4,0.06:2.7741593377792194E-4,0.057:2.2975316663437597E-4,0.073:2.1769059479546598E-4,0.01:1.4945234676270913E-4}
21551
{0:0.43298549949426596,0.07:0.30407459462158964,0.0625:0.15744077685124136,0.01:0.06385977925647884,0.025:0.020205557109867888,0.04:0.013226123091680062,0.055:0.006501562964287696,0.073:5.871897881404074E-4,0.003:4.554836644848264E-4,0.05:2.0489216962221778E-4}
21552
{0.06:0.7383744333942458,0.02:0.07321126682102753,0.07:0.05910378841288311,0.10:0.056907223730939045,0:0.02739559786902668,0.055:0.02228913751272657,0.1:0.00943274247398869,0.073:0.007301445750018608,0.03:0.0027711985062277246,0.046:0.0022569760697531112}
21553
{0.006913:0.9383779606743132,0.07:0.018356074893823605,0.1:0.017572502072403694,0.025:0.011031848455352145,0.02:0.007110054905474798,0.10:0.0032390933761145377,0.01:0.0022676909091530165,0.06:6.35712654913703E-4,0.003:3.037608224814005E-4,0.0625:2.569989005512836E-4}
21554
{0.02:0.4740260172915081,0.03:0.3264871353578654,0.003:0.12383071192450323,0:0.0387913487693229,0.046:0.013922363892934853,0.01:0.01115832101415319,0.055:0.004438529958216764,0.006913:0.0031112721201723413,0.025:0.002399946628810436,0.073:6.131643125523258E-4}
21555
{0.01:0.42396888624053325,0.057:0.2876207419392007,0.055:0.13436747721404915,0.10:0.0923627978966975,0.05:0.030530372962547347,0.003:0.008404234144369286,0.006913:0.007605013189108045,0.007050:0.005108825532219537,0.0625:0.004651573366090424,0.025:0.002180181204608079}
21556
{0.02:0.9634685201058687,0.077:0.02816791882994464,0.003:0.007368001305747207,0.01:4.6299664334060455E-4,0.055:6.358811991095171E-5,0.06:5.5240076581247115E-5,0.03:5.231350505975146E-5,0.073:4.4773553407989E-5,0:3.6810457882911134E-5,0.006913:3.469265861451538E-5}
21557
{0.06:0.9993947051352264,0.05:2.53296407549323E-4,0.03:9.647254237540585E-5,0.04:4.3439270433017595E-5,0.006913:3.354681491698619E-5,0.046:2.8586727566416525E-5,0.007050:2.5584606074089293E-5,0.02:2.2477243875316502E-5,0:1.9194783598477564E-5,0.073:1.6900267481892075E-5}
21558
{0.06:0.9985073788555696,0.03:8.033091100049725E-4,0.02:1.1953766739610202E-4,0.1:9.257610652171745E-5,0.006913:6.903783269939522E-5,0.04:5.7960967280609926E-5,0.07:5.5767050956214925E-5,0.046:4.857027337508731E-5,0.007050:3.5964741541364354E-5,0.057:3.538185773175377E-5}
21559
{0.006913:0.5411158302162348,0.073:0.10595054605908563,0.04:0.10013413310674449,0.007050:0.08891690362990352,0.003:0.08028744789933502,0.03:0.07779866329563544,0.025:0.0024314950846438975,0.0625:0.0018111845757907532,0.077:5.019763638644379E-4,0.055:2.6705221435486376E-4}
21560
{0.06:0.9978884964462115,0.03:9.139179655096763E-4,0.02:6.066525825847198E-4,0.04:3.2209695809936266E-4,0.006913:7.970816960964983E-5,0.007050:5.58984012266038E-5,0.046:1.545206468939752E-5,0.077:1.5319891034946036E-5,0:1.5223193391279898E-5,0.073:1.1687759018690983E-5}
21561
{0.06:0.8704787441462888,0.007050:0.09131090501970876,0.0625:0.03778566273654969,0.046:8.111987924033587E-5,0.01:6.397000919080148E-5,0.1:4.979138875277178E-5,0.073:2.7942853854174412E-5,0.03:2.762357950066146E-5,0.077:2.6820742114533983E-5,0.025:2.6208487691114472E-5}
21562
{0.06:0.9625603310705717,0.02:0.01708933716171641,0.007050:0.011374975845817934,0.07:0.00482114999912725,0.057:0.003310531318631415,0.077:2.568457462720504E-4,0.025:1.3439876933758153E-4,0.1:1.0512433283405881E-4,0:8.737443941062146E-5,0.046:5.9284232059258864E-5}
21563
{0.06:0.9996809273982157,0.046:1.6325482120709976E-4,0.02:2.1084598024339765E-5,0.006913:1.7901242019979392E-5,0.04:1.3991162886383784E-5,0.03:1.3682157013015017E-5,0.077:1.3602492456590212E-5,0.007050:1.206712606141859E-5,0.1:1.0231842284246997E-5,0.057:7.704725064510759E-6}
21564
{0.06:0.9997861963976675,0.03:2.7140899197995276E-5,0.077:2.4456158311423386E-5,0.04:1.7952818233846462E-5,0.02:1.6334455815684533E-5,0.006913:1.4910843270220926E-5,0.073:1.4893553937733922E-5,0.1:1.2545291899719683E-5,0.007050:1.2337617858874285E-5,0.07:1.121302251254011E-5}
21565
{0.006913:0.5047832315246878,0.007050:0.2502505818382197,0.04:0.09937533960784072,0.03:0.05332716291468396,0.0625:0.035738976624857435,0.05:0.023139962103851885,0.1:0.01510786357969295,0.025:0.01294816540331917,0.06:0.003589347327961106,0.073:4.8607269023994543E-4}
21566
{0.03:0.865579490292393,0.073:0.065013560785593,0.077:0.056622094108767465,0.046:0.006811842330071251,0.057:0.0021561477140846267,0.01:0.0013149375957061502,0.04:7.456782721333958E-4,0.05:6.412995789267404E-4,0:2.107021879325011E-4,0.007050:1.8121393989190674E-4}
21567
{0.077:0.6528663315309344,0.03:0.2794089480653573,0.025:0.060214953606503134,0.003:0.0029851203917978303,0.01:0.002840784719750811,0.007050:5.508511345707982E-4,0.02:3.3395642786457786E-4,0.07:1.7452781529689483E-4,0.055:1.541344869853217E-4,0.046:1.0154945247629696E-4}
21568
{0.057:0.7925855379581803,0.03:0.2036907633660934,0.06:0.002283924010657722,0.046:0.0011125177332923534,0.10:6.706442563331911E-5,0.02:3.9070227131596934E-5,0.07:3.266094677087569E-5,0.1:2.3436639268605713E-5,0.077:2.318067689954084E-5,0.006913:2.273173553155518E-5}
21569
{0.06:0.7223525910216753,0.1:0.22280373045161775,0.04:0.04239924319412595,0.02:0.006529556700876843,0.007050:0.004661124794787862,0.10:2.017974034648702E-4,0.05:2.017516118028694E-4,0.025:1.583677755896652E-4,0.006913:1.1939663934259253E-4,0:8.736457986006156E-5}
21570
{0.073:0.5321098304788365,0.006913:0.3897385574581158,0.02:0.051125703897749404,0.077:0.012417023254098358,0.01:0.006491518762040415,0.03:0.005434610908750246,0.055:9.352347087701305E-4,0.06:5.239363525283659E-4,0.10:4.4114444135088393E-4,0.003:2.2360302221231853E-4}
21571
{0.06:0.9074255414695478,0.05:0.08971808812931319,0.02:0.0019124104766371694,0.1:2.581234320438502E-4,0.073:1.9041498001195312E-4,0.046:8.348355382104383E-5,0.006913:5.748062878632201E-5,0.057:4.8979196235823963E-5,0.04:4.4005272512340306E-5,0.03:4.1089625609562384E-5}
21572
{0.06:0.7216888928389846,0.04:0.1955684645266304,0.006913:0.08235454447065854,0.1:9.82899034505965E-5,0.046:4.9184577303445956E-5,0.05:4.021965070167039E-5,0.007050:3.081280652634891E-5,0.073:2.791145099471127E-5,0.02:1.95681856192452E-5,0:1.9037190007395713E-5}
21573
{0.05:0.8568971411565196,0.046:0.12909436895238377,0.06:0.012704234652048044,0.04:2.472784666357729E-4,0.055:2.1886698996361582E-4,0.1:1.1974451444757112E-4,0.0625:9.082357988309755E-5,0.07:9.030017229129562E-5,0.03:7.268045692763623E-5,0.073:6.607926928741721E-5}
21574
{0.046:0.5619466458628039,0.006913:0.184782367089353,0.0625:0.09726566772972363,0.003:0.09534816862353344,0.02:0.015692927163565275,0.073:0.015492017672231727,0.01:0.01411218625979968,0.007050:0.01037341031640615,0.055:0.0020124341216292752,0:0.0013828685922332715}
21575
{0.05:0.3167681189235041,0.06:0.2879333280436204,0.046:0.22584628506521745,0.003:0.15994241628395953,0.0625:0.006482994028630967,0.1:0.0016773104050919493,0.055:4.8725015996676173E-4,0.03:2.432893903382962E-4,0.025:1.6015914759364425E-4,0.006913:1.0021292427951807E-4}
21576
{0.077:0.2407816981967022,0.003:0.18594668103110193,0.10:0.14951775492012523,0.0625:0.09241152906714677,0.007050:0.08997645163280943,0.057:0.07102865286733068,0.055:0.048046579920457584,0.05:0.03776387140040494,0.073:0.033355081179026046,0.006913:0.015980584385115525}
21577
{0.06:0.9599074612361259,0.1:0.03694135499501186,0.04:0.0021941315048273186,0.057:4.7070027358666304E-4,0.02:1.8231606308803002E-4,0.0625:6.234942557920162E-5,0.006913:5.159011604129561E-5,0.046:3.9832812943124216E-5,0.007050:3.0380683715134534E-5,0.05:2.8920851352755496E-5}
*Streaming k-means:*
[snip]
INFO: Number of Centroids: 0
Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Job run
WARNING: job_local23982482_0001
java.lang.IllegalArgumentException: Must have nonzero number of training
and test vectors. Asked for %.1f %% of %d vectors for test
[10.000000149011612, 0]
at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:120)
at
org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
at
org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
at
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
at
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
at
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
[snip]
WARNING: No qualcluster.props found on classpath, will use command-line
arguments only
Num clusters: 0; maxDistance: 0.000000
[Dunn Index] First: Infinity
[Davies-Bouldin Index] First: NaN
Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 535 ms (Minutes: 0.008916666666666666)
cluster,distance.mean,distance.sd
,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
On Tue, Jan 21, 2014 at 1:47 PM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:
> *classify-20newsgroups.sh*
>
> *Complementary naive bayes:*
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances : 11207 98.9406%
> Incorrectly Classified Instances : 120 1.0594%
> Total Classified Instances : 11327
>
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a b c d e f g h i j
> k l m n o p q r s
> t <--Classified as
> 475 0 0 1 0 0 0 0 0 0
> 0 0 0 0 1 0 1 0 0
> 0 | 478 a = alt.atheism
> 0 597 1 1 0 1 1 0 0 0
> 0 1 0 2 1 0 0 0 0
> 0 | 605 b = comp.graphics
> 0 1 620 3 0 1 0 0 0 0
> 0 1 0 0 1 0 0 0 0
> 0 | 627 c = comp.os.ms-windows.misc
> 1 1 1 593 2 0 0 0 0 0
> 0 0 0 0 0 1 0 0 0
> 0 | 599 d = comp.sys.ibm.pc.hardware
> 0 1 1 0 568 0 1 0 0 0
> 1 1 2 0 0 0 0 1 0
> 0 | 576 e = comp.sys.mac.hardware
> 0 4 2 0 0 581 0 0 0 0
> 0 0 0 0 0 0 0 0 0
> 0 | 587 f = comp.windows.x
> 0 0 0 1 2 0 571 3 0 0
> 1 1 4 1 0 0 0 0 0
> 0 | 584 g = misc.forsale
> 0 0 0 1 0 0 0 589 1 0
> 0 1 1 0 0 0 0 0 0
> 0 | 593 h = rec.autos
> 0 0 0 0 0 0 0 1 565 0
> 0 0 0 0 1 0 0 0 0
> 0 | 567 i = rec.motorcycles
> 0 0 0 0 0 0 0 0 0
> 600 2 0 0 0 1 0 0 0 0
> 0 | 603 j = rec.sport.baseball
> 0 0 0 0 0 0 0 0 0 1
> 584 0 0 0 0 0 0 0 0
> 0 | 585 k = rec.sport.hockey
> 0 0 0 0 0 0 0 0 0 0
> 0 579 0 0 0 0 0 1 0
> 0 | 580 l = sci.crypt
> 0 0 0 1 3 0 2 0 0 2
> 0 0 567 1 2 1 0 0 0
> 0 | 579 m = sci.electronics
> 0 0 0 0 0 0 0 0 0 0
> 0 0 1 605 0 0 0 0 0
> 0 | 606 n = sci.med
> 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 602 0 0 0 0
> 0 | 602 o = sci.space
> 0 0 0 0 0 0 0 0 0 0
> 0 0 0 1 0 602 0 0 1
> 0 | 604 p = soc.religion.christian
> 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 556 0 0
> 0 | 556 q = talk.politics.mideast
> 0 0 1 0 0 0 0 0 0 0
> 0 1 0 0 1 0 0 568 0
> 0 | 571 r = talk.politics.guns
> 11 0 0 0 0 0 0 0 0 1
> 0 0 0 1 3 8 1 4 338
> 2 | 369 s = talk.religion.misc
> 0 0 0 0 0 0 0 0 0 0
> 1 0 0 0 1 0 3 4 0
> 447 | 456 t = talk.politics.misc
>
> =======================================================
> Statistics
> -------------------------------------------------------
> Kappa 0.9806
> Accuracy 98.9406%
> Reliability 94.0932%
> Reliability (standard deviation) 0.2163
>
> Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 15870 ms (Minutes: 0.2645)
> + echo 'Testing on holdout set'
> Testing on holdout set
> + ./bin/mahout testnb -i /tmp/mahout-work-ec2-user/20news-test-vectors -m
> /tmp/mahout-work-ec2-user/model -l /tmp/mahout-work-ec2-user/labelindex -ow
> -o /tmp/mahout-work-ec2-user/20news-testing -c
>
> [snip]
>
> INFO: Complementary Results:
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances : 6715 89.3071%
> Incorrectly Classified Instances : 804 10.6929%
> Total Classified Instances : 7519
>
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a b c d e f g h i j
> k l m n o p q r s
> t <--Classified as
> 298 0 0 0 0 0 0 0 0 1
> 0 0 0 1 2 5 1 0 13
> 0 | 321 a = alt.atheism
> 0 298 11 6 1 12 2 2 1 1
> 3 8 3 4 2 4 1 4 4
> 1 | 368 b = comp.graphics
> 1 17 286 16 4 9 6 3 2 0
> 1 0 1 7 1 0 2 1 0
> 1 | 358 c = comp.os.ms-windows.misc
> 2 6 11 309 9 5 14 8 1 0
> 2 0 6 4 2 0 1 2 1
> 0 | 383 d = comp.sys.ibm.pc.hardware
> 0 10 8 7 334 7 5 5 2 0
> 3 0 2 1 1 0 1 1 0
> 0 | 387 e = comp.sys.mac.hardware
> 1 13 7 8 2 355 2 0 2 0
> 0 5 1 1 3 0 0 1 0
> 0 | 401 f = comp.windows.x
> 0 7 11 29 12 9 268 16 8 4
> 3 2 6 4 2 1 3 1 2
> 3 | 391 g = misc.forsale
> 0 1 0 0 3 0 7 362 8 2
> 2 1 2 0 2 0 1 2 0
> 4 | 397 h = rec.autos
> 0 0 0 1 0 0 1 0 423 0
> 0 0 2 1 0 1 0 0 0
> 0 | 429 i = rec.motorcycles
> 0 0 1 0 0 0 0 2 2
> 371 8 0 2 3 0 2 0 0 0
> 0 | 391 j = rec.sport.baseball
> 0 0 1 0 0 0 1 0 0 2
> 409 0 0 0 0 0 0 0 0
> 1 | 414 k = rec.sport.hockey
> 0 0 1 2 1 0 1 0 0 0
> 0 404 0 0 0 0 0 1 0
> 1 | 411 l = sci.crypt
> 0 5 4 11 1 3 7 9 2 5
> 3 3 339 2 6 0 1 1 2
> 1 | 405 m = sci.electronics
> 0 4 0 1 0 0 0 1 0 1
> 1 0 3 367 3 1 2 0 0
> 0 | 384 n = sci.med
> 0 1 2 0 1 0 2 0 0 1
> 0 0 1 1 375 0 1 0 0
> 0 | 385 o = sci.space
> 4 2 1 1 0 0 1 1 2 0
> 0 1 1 5 1 367 4 0 1
> 1 | 393 p = soc.religion.christian
> 0 1 0 0 0 0 0 0 0 2
> 0 0 0 0 0 2 378 0 1
> 0 | 384 q = talk.politics.mideast
> 0 0 0 0 0 2 1 1 1 1
> 0 3 0 3 0 0 2 319 2
> 4 | 339 r = talk.politics.guns
> 32 0 0 1 0 0 0 0 0 1
> 1 1 0 2 2 26 5 7 175
> 6 | 259 s = talk.religion.misc
> 0 0 0 2 0 0 0 0 0 1
> 2 2 0 1 2 1 10 18 2
> 278 | 319 t = talk.politics.misc
>
> =======================================================
> Statistics
> -------------------------------------------------------
> Kappa 0.8594
> Accuracy 89.3071%
> Reliability 84.611%
> Reliability (standard deviation) 0.2148
>
> Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 10840 ms (Minutes: 0.18066666666666667)
>
>
> *Naive bayes:*
> INFO: Standard NB Results:
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances : 11286 99.0869%
> Incorrectly Classified Instances : 104 0.9131%
> Total Classified Instances : 11390
>
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a b c d e f g h i j
> k l m n o p q r s
> t <--Classified as
> 474 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 2
> 1 | 477 a = alt.atheism
> 0 566 0 2 0 1 0 0 0 0
> 0 0 0 0 0 0 0 0 0
> 0 | 569 b = comp.graphics
> 0 10 590 29 2 4 1 0 0 0
> 0 0 1 0 0 0 0 0 0
> 1 | 638 c = comp.os.ms-windows.misc
> 0 0 0 596 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0
> 0 | 596 d = comp.sys.ibm.pc.hardware
> 0 0 0 0 575 0 1 0 0 0
> 0 0 1 0 0 0 0 0 0
> 0 | 577 e = comp.sys.mac.hardware
> 0 2 2 2 0 593 1 0 0 0
> 0 0 0 0 1 0 0 0 0
> 0 | 601 f = comp.windows.x
> 0 0 0 1 0 0 589 1 0 0
> 1 0 2 0 0 0 0 0 0
> 0 | 594 g = misc.forsale
> 0 0 0 0 0 0 0 594 0 0
> 0 0 0 0 0 0 0 0 0
> 0 | 594 h = rec.autos
> 0 0 0 0 0 0 0 0 611 0
> 0 0 0 0 0 0 0 0 0
> 0 | 611 i = rec.motorcycles
> 0 0 0 0 0 0 0 0 0
> 616 1 0 0 0 0 0 0 0 0
> 0 | 617 j = rec.sport.baseball
> 0 0 0 0 0 0 1 0 0 0
> 620 0 0 0 0 0 0 0 0
> 0 | 621 k = rec.sport.hockey
> 0 0 0 0 0 0 0 0 0 0
> 0 580 0 0 0 0 0 1 0
> 0 | 581 l = sci.crypt
> 0 0 0 3 1 0 0 0 0 0
> 0 0 571 0 0 0 0 0 0
> 0 | 575 m = sci.electronics
> 0 0 0 0 0 0 0 0 0 0
> 0 0 2 583 0 0 0 0 0
> 0 | 585 n = sci.med
> 0 0 0 0 0 0 0 0 0 0
> 0 0 0 1 599 0 0 0 0
> 0 | 600 o = sci.space
> 0 1 0 0 0 0 0 0 0 0
> 0 0 0 0 0 615 0 0 0
> 0 | 616 p = soc.religion.christian
> 1 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 1 560 0 0
> 0 | 562 q = talk.politics.mideast
> 0 0 1 0 0 0 0 0 0 0
> 0 1 0 0 0 0 0 548 0
> 1 | 551 r = talk.politics.guns
> 10 0 0 0 0 0 0 0 0 0
> 0 0 0 0 1 1 0 2 344
> 1 | 359 s = talk.religion.misc
> 0 0 0 0 0 0 0 0 0 0
> 0 1 1 0 0 0 0 2 0
> 462 | 466 t = talk.politics.misc
>
> =======================================================
> Statistics
> -------------------------------------------------------
> Kappa 0.9847
> Accuracy 99.0869%
> Reliability 94.3334%
> Reliability (standard deviation) 0.2169
>
> Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 14304 ms (Minutes: 0.2384)
> + echo 'Testing on holdout set'
> Testing on holdout set
>
> [snip]
>
> INFO: Standard NB Results:
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances : 6718 90.1019%
> Incorrectly Classified Instances : 738 9.8981%
> Total Classified Instances : 7456
>
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a b c d e f g h i j
> k l m n o p q r s
> t <--Classified as
> 294 0 0 0 0 0 0 0 0 0
> 0 2 0 1 1 6 1 1 16
> 0 | 322 a = alt.atheism
> 0 345 6 14 6 11 6 0 0 0
> 0 5 7 1 3 0 0 0 0
> 0 | 404 b = comp.graphics
> 2 29 177 78 22 19 9 1 0 0
> 0 4 2 0 1 1 0 0 1
> 1 | 347 c = comp.os.ms-windows.misc
> 1 9 2 335 18 2 10 0 0 0
> 1 0 8 0 0 0 0 0 0
> 0 | 386 d = comp.sys.ibm.pc.hardware
> 1 4 2 13 347 3 5 1 0 0
> 1 0 7 1 0 0 0 1 0
> 0 | 386 e = comp.sys.mac.hardware
> 0 20 0 4 0 352 4 0 0 0
> 0 0 1 1 3 0 1 0 1
> 0 | 387 f = comp.windows.x
> 0 2 0 21 5 1 323 7 2 2
> 0 2 12 0 3 0 0 0 0
> 1 | 381 g = misc.forsale
> 0 1 0 0 1 0 15 363 8 1
> 0 0 4 1 0 0 0 1 0
> 1 | 396 h = rec.autos
> 0 1 0 0 0 0 6 6 370 0
> 0 0 0 1 0 0 0 0 1
> 0 | 385 i = rec.motorcycles
> 1 0 0 1 1 0 2 1 2
> 362 5 0 2 0 0 0 0 0 0
> 0 | 377 j = rec.sport.baseball
> 0 0 0 1 2 0 0 0 0 3
> 371 0 0 0 0 0 0 0 0
> 1 | 378 k = rec.sport.hockey
> 0 3 1 0 1 0 2 0 0 0
> 0 396 0 1 0 0 1 1 1
> 3 | 410 l = sci.crypt
> 0 7 0 7 7 2 6 4 0 0
> 0 1 369 2 2 0 0 0 0
> 2 | 409 m = sci.electronics
> 0 3 0 2 1 0 2 0 0 0
> 0 1 4 383 4 0 0 1 0
> 4 | 405 n = sci.med
> 0 5 0 0 1 0 3 0 0 0
> 0 0 1 0 374 1 0 0 1
> 1 | 387 o = sci.space
> 6 2 0 1 1 0 0 1 0 1
> 0 0 1 5 0 352 2 1 7
> 1 | 381 p = soc.religion.christian
> 1 1 0 0 0 0 0 0 0 0
> 1 0 0 0 0 0 373 1 0
> 1 | 378 q = talk.politics.mideast
> 0 0 0 0 0 0 1 0 1 0
> 0 2 0 0 0 0 0 346 2
> 7 | 359 r = talk.politics.guns
> 26 1 0 1 0 0 0 2 0 1
> 1 0 0 1 1 20 2 6 200
> 7 | 269 s = talk.religion.misc
> 1 0 0 0 0 0 0 2 0 0
> 1 0 0 2 2 0 1 14 0
> 286 | 309 t = talk.politics.misc
>
> =======================================================
> Statistics
> -------------------------------------------------------
> Kappa 0.8726
> Accuracy 90.1019%
> Reliability 85.4491%
> Reliability (standard deviation) 0.2222
>
> Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 10878 ms (Minutes: 0.1813)
>
> *SGD:*
> 7532 test files
>
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances : 5649 75%
> Incorrectly Classified Instances : 1883 25%
> Total Classified Instances : 7532
>
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a b c d e f g h i j
> k l m n o p q r s
> t <--Classified as
> 186 6 3 10 5 0 33 4 13 15
> 7 1 24 15 3 15 5 5 29
> 15 | 394 a = sci.space
> 5 309 0 3 2 5 0 0 0 1
> 9 21 2 0 0 18 4 4 1
> 1 | 385 b = comp.sys.mac.hardware
> 4 1 101 3 0 1 63 0 7 0
> 1 1 5 16 3 0 3 7 1
> 34 | 251 c = talk.religion.misc
> 11 12 1 265 1 10 3 0 0 17
> 10 11 5 2 0 11 3 6 21
> 0 | 389 d = comp.graphics
> 2 1 1 0 349 2 3 0 3 2
> 6 1 5 1 0 2 15 2 1
> 2 | 398 e = rec.motorcycles
> 7 20 3 19 2 254 6 0 2 11
> 2 39 7 2 0 4 2 2 9
> 3 | 394 f = comp.os.ms-windows.misc
> 2 1 13 0 0 0 247 0 1 1
> 3 0 6 2 4 0 2 3 5
> 29 | 319 g = alt.atheism
> 1 1 0 0 2 0 2 361 0 1
> 2 0 2 0 0 1 3 22 0
> 1 | 399 h = rec.sport.hockey
> 3 0 3 1 0 0 5 0 161 0
> 1 2 12 102 0 0 1 2 11
> 6 | 310 i = talk.politics.misc
> 2 8 0 19 0 19 0 0 1
> 294 10 11 4 2 0 5 0 3 11
> 6 | 395 j = comp.windows.x
> 2 10 0 1 1 0 0 0 0 1
> 347 13 2 1 0 5 3 2 2
> 0 | 390 k = misc.forsale
> 1 36 0 6 1 25 0 0 1 6
> 10 257 2 1 0 34 6 0 6
> 0 | 392 l = comp.sys.ibm.pc.hardware
> 2 2 2 2 1 0 12 0 0 6
> 10 4 312 5 2 13 11 3 3
> 6 | 396 m = sci.med
> 2 0 3 2 1 0 0 1 13 0
> 5 1 2 314 2 0 2 2 10
> 4 | 364 n = talk.politics.guns
> 1 0 2 1 1 0 34 1 33 1
> 3 0 1 8 271 1 4 5 6
> 3 | 376 o = talk.politics.mideast
> 3 14 0 8 2 8 3 1 1 7
> 12 29 6 2 1 245 13 2 32
> 4 | 393 p = sci.electronics
> 3 3 0 2 11 0 1 0 2 1
> 11 6 4 2 0 11 330 4 4
> 1 | 396 q = rec.autos
> 0 0 1 0 1 0 4 12 3 1
> 3 0 0 0 0 5 6 359 1
> 1 | 397 r = rec.sport.baseball
> 0 1 0 0 0 1 0 0 3 3
> 0 0 3 2 1 6 1 6 366
> 3 | 396 s = sci.crypt
> 0 2 11 1 1 0 40 0 1 2
> 3 4 2 1 0 5 0 2 2
> 321 | 398 t = soc.religion.christian
>
> =======================================================
> Statistics
> -------------------------------------------------------
> Kappa 0.7073
> Accuracy 75%
> Reliability 70.6238%
> Reliability (standard deviation) 0.2187
> Log-likelihood mean : -1.1182
> 25%-ile : -1.6911
> 75%-ile : -0.0803
>
> Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 10783 ms (Minutes: 0.17971666666666666)
>
>
>
>
> On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <su...@yahoo.com>wrote:
>
>> Thanks Andrew for reporting that. I rolled back the release to fix this
>> and few other issues.
>>
>> We have removed asf-examples*.sh from trunk as the sample file at the url
>> mentioned in ur email is not available.
>> This is something we need to fix and restore in 1.0.
>>
>>
>>
>>
>>
>>
>>
>> On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <ap...@outlook.com>
>> wrote:
>>
>> from the asf-email-examples.sh script:
>>
>> # You will need to download or otherwise obtain some or all of the Amazon
>> ASF Em
>> ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566) to
>> use this
>> script.
>> # To obtain a full copy you will need to launch an EC2 instance and mount
>> the da
>> taset to download it, otherwise you can get a sample of it at
>> #
>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>>
>> It looks like the:
>> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>>
>> link is down.
>>
>> Is there somewhere else that we can get a subset of the ASF emails?
>>
>>
>>
>> Date: Tue, 21 Jan 2014 09:48:06 -0800
>> > Subject: Re: MAHOUT 0.9 Release - New URL
>> > From: andrew.musselman@gmail.com
>> > To: dev@mahout.apache.org
>> >
>> > Sure thing; continuing to smoke test the other examples tonight
>> >
>> >
>> > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <suneel_marthi@yahoo.com
>> >wrote:
>> >
>> > > Thanks Andrew M., see that some of the example scripts need to be
>> fixed as
>> > > they still refer to the deprecated algorithms.
>> > > See that the Streaming KMeans has failed for you as well.
>> > >
>> > > I'll be rolling back the release today to fix these issues.
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
>> > > andrew.musselman@gmail.com> wrote:
>> > >
>> > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default
>> 64-bit
>> > > Linux AMI from tarball.
>> > >
>> > > All tests pass.
>> > >
>> > > *Output of examples:*
>> > > *asf-email-examples.sh, run on mahout.apache.org
>> > > <http://mahout.apache.org>:*
>> > > *recommendations:*
>> > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
>> > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000 | less
>> > > 1
>> > >
>> > >
>> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
>> > > 4
>> > >
>> > >
>> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
>> > > 6
>> > >
>> > >
>> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
>> > > 8
>> > > [12758:1.0,19409:1.0,11112:1.0]
>> > > 11
>> > >
>> > >
>> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
>> > > 14
>> > >
>> > >
>> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
>> > > 15
>> > >
>> > >
>> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
>> > > 16
>> > >
>> > >
>> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
>> > > 18
>> > >
>> > >
>> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
>> > > 19 [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
>> > > 20
>> > >
>> > >
>> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
>> > > [snip]
>> > >
>> > > *clustering; kmeans:*
>> > > [snip]
>> > > Weight : [props - optional]: Point:
>> > > 1.0 :
>> > > [distance-squared=1.0193102046188427]:
>> > > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus =
>> > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
>> 7573:0.204,
>> > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093,
>> 9779:0.159,
>> > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
>> > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
>> > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
>> > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
>> > > 39789:0.110, 40743:0.190, 45775:0.086]
>> > > 1.0 : [distance-squared=0.9823018320457279]:
>> > > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus=
>> > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
>> 5336:0.106,
>> > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173,
>> 7832:0.072,
>> > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
>> > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
>> > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
>> > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
>> > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
>> > > 1.0 : [distance-squared=0.9509142993214911]:
>> > > /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor=
>> > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
>> > > 4419:0.076,
>> > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056,
>> 7235:0.048,
>> > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123,
>> 7683:0.077,
>> > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
>> 10225:0.081,
>> > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
>> > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
>> > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
>> > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
>> > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
>> > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
>> > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
>> > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
>> > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
>> > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
>> > > 41280:0.065, 41696:0.072, 41947:0.118,
>> > > 43685:0.086, 44077:0.308,
>> > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
>> > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
>> > > [snip]
>> > >
>> > > *clustering; dirichlet:*
>> > > Get this complaint:
>> > > Running Dirichlet with K = 8
>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
>> > > HADOOP_CONF_DIR=
>> > > MAHOUT-JOB:
>> > >
>> > >
>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class:
>> dirichlet
>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props found
>> on
>> > > classpath, will use command-line arguments only
>> > > Unknown program 'dirichlet' chosen.
>> > >
>> > > *clustering: minhash:*
>> > > Running Minhash
>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
>> > > HADOOP_CONF_DIR=
>> > > MAHOUT-JOB:
>> > >
>> > >
>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>> > > 14/01/21 05:17:27 WARN
>> > > driver.MahoutDriver: Unable to add class: minhash
>> > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found on
>> > > classpath, will use command-line arguments only
>> > > Unknown program 'minhash' chosen.
>> > >
>> > > *classification; standard:*
>> > > =======================================================
>> > > Summary
>> > > -------------------------------------------------------
>> > > Correctly Classified Instances : 5384 87.7874%
>> > > Incorrectly Classified Instances : 749 12.2126%
>> > > Total Classified Instances : 6133
>> > >
>> > > =======================================================
>> > > Confusion Matrix
>> > > -------------------------------------------------------
>> > > a b c d
>> > > <--Classified as
>> > > 2949 7 531 25 | 3512 a = dev
>> > > 0 0 0 0 | 0 b = general
>> > > 99 8 1763 8 | 1878 c = user
>> > > 41 1 29 672 | 743 d = commits
>> > >
>> > > =======================================================
>> > > Statistics
>> > > -------------------------------------------------------
>> > > Kappa
>> > > 0.7877
>> > > Accuracy 87.7874%
>> > > Reliability 53.658%
>> > > Reliability (standard deviation) 0.4911
>> > >
>> > > *classification; complementary:*
>> > > =======================================================
>> > > Summary
>> > > -------------------------------------------------------
>> > > Correctly Classified Instances : 5530 90.1679%
>> > > Incorrectly Classified Instances : 603 9.8321%
>> > > Total Classified Instances :
>> > > 6133
>> > >
>> > > =======================================================
>> > > Confusion Matrix
>> > > -------------------------------------------------------
>> > > a b c d <--Classified as
>> > > 3168 0 276 68 | 3512 a = dev
>> > > 0 0 0 0 | 0 b = general
>> > > 196 0 1652 30 | 1878 c = user
>> > > 25 0 8 710 | 743 d =
>> > > commits
>> > >
>> > > =======================================================
>> > > Statistics
>> > > -------------------------------------------------------
>> > > Kappa 0.8259
>> > > Accuracy 90.1679%
>> > > Reliability 54.7459%
>> > > Reliability (standard deviation) 0.5005
>> > >
>> > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms
>> (Minutes:
>> > > 0.34836666666666666)
>> > >
>> > > *classification; sgd, with three categories:*
>> > > Running SGD Training
>> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
>> > > and
>> > > HADOOP_CONF_DIR=
>> > > MAHOUT-JOB:
>> > >
>> > >
>> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
>> > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
>> > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
>> classpath,
>> > > will use command-line arguments only
>> > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
>> > > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
>> > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
>> > > --output=[asf-output/classification/sgd/models], --poolSize=[5],
>> > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
>> > > 24168 training files
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 1
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
>> > > 2
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 3
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 4
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 6
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 8
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 10
>> > > 0.000 0.00 none
>> > > 0.00 0.00
>> > > 0.00 0.00 0.0000000 0.0000000 12
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 15
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 20
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 25
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 30
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000
>> > > 0.0000000 40
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 50
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 60
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 70
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 80
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 100
>> > > 0.000
>> > > 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 120
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 140
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 150
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 200
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 250
>> > > 0.000 0.00 none
>> > > 0.00 0.00
>> > > 0.00 0.00 0.0000000 0.0000000 300
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 400
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 500
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 600
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 700
>> > > 0.000 0.00 none
>> > > 0.00 0.00 0.00 0.00 0.0000000
>> > > 0.0000000 800
>> > > 0.000 0.00 none
>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
>> > > 1.0019413e-08 1000 -0.607 75.78 none
>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
>> > > 1.0019413e-08 1200 -0.607 75.78 none
>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
>> > > 1.0019413e-08 1400 -0.607 75.78 none
>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
>> > > 1.0019413e-08 1500 -0.607 75.78 none
>> > > 0.24 43686.00 17924.00 329.50
>> > > 1.0571799e-08
>> > > 1.0032261e-08 2000 -0.487 82.65 none
>> > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
>> > > 1.0011902e-08 2500 -0.439 83.90 none
>> > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
>> > > 1.0011902e-08 3000 -0.439 83.90 none
>> > > 0.32 50635.00 28531.00 437.09 1.0551175e-08
>> > > 1.0000001e-08 4000 -0.351 88.14 none
>> > > 0.32 50635.00 32642.00 437.09 1.0551175e-08
>> > > 1.0000000e-08 5000 -0.378 87.10 none
>> > > 0.32 50635.00 36461.00 437.09
>> > > 1.0556652e-08
>> > > 1.0000001e-08 6000 -0.372 86.89 none
>> > > 0.32 50635.00 37768.00 437.09 1.0576742e-08
>> > > 1.0000001e-08 7000 -0.334 89.26 none
>> > > 0.32 50635.00 38807.00 437.09 1.0576742e-08
>> > > 1.0000000e-08 8000 -0.368 87.52 none
>> > > 0.32 50635.00 44731.00 437.09 1.0576716e-08
>> > > 1.0000000e-08 10000 -0.374 87.39 none
>> > > 0.32 50635.00 45672.00 437.09 1.0576716e-08
>> > > 1.0000000e-08 12000 -0.298 88.26 none
>> > > Exception in thread "main" java.lang.IllegalStateException:
>> > > java.lang.ArrayIndexOutOfBoundsException:
>> > > 2
>> > > at
>> > >
>> > >
>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
>> > > at
>> > >
>> > >
>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
>> > > at
>> > >
>> > >
>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
>> > > at
>> > >
>> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
>> > > at
>> > >
>> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > > at
>> > >
>> > >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> > >
>> > > at
>> > >
>> > >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> > > at java.lang.reflect.Method.invoke(Method.java:622)
>> > > at
>> > >
>> > >
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>> > > at
>> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>> > > at
>> > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > > at
>> > >
>> > >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> > > at
>> > >
>> > >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> > > at java.lang.reflect.Method.invoke(Method.java:622)
>> > > at
>> > > org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
>> > > at
>> > > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
>> > > at
>> > >
>> > >
>> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
>> > > at
>> > >
>> > >
>> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
>> > > at
>> > >
>> > >
>> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
>> > > at
>> > >
>> > >
>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
>> > > at
>> > >
>> > >
>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
>> > >
>> > > at
>> > >
>> > >
>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
>> > > at
>> > >
>> > >
>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
>> > > at
>> > >
>> > >
>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
>> > > at
>> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>> > > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> > > at
>> > >
>> > >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>> > > at
>> > >
>> > >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> > > at java.lang.Thread.run(Thread.java:701)
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
>> > > andrew.musselman@gmail.com> wrote:
>> > >
>> > > > Trying out the build today
>> > > >
>> > > >
>> > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
>> suneel_marthi@yahoo.com
>> > > >wrote:
>> > > >
>> > > >> This is an issue (trivial one though) that needs to be fixed for
>> 0.9
>> > > >> Release, will be rerolling the release today (in the next few hrs)
>> and
>> > > >> putting out a new release candidate in staging.
>> > > >>
>> > > >> Thanks for reporting this Andrew P.
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
>> > > ap.dev@outlook.com>
>> > > >> wrote:
>> > > >>
>> > > >> I ran through the tests with on a CentOS VM
>> > > AMD64 2 cores 4 GB RAM. Had
>> > > >> a bit of trouble getting the Hadoop natives to compile and
>> therefore may
>> > > >> have run into some problems because of the hadoop setup. Ran into
>> some
>> > > >> problems in the example scripts. Particularly with
>> > > >> ./cluster-syntheticcontrol.sh ->4,5. I will run through the rest
>> of the
>> > > >> examples when im sure I've got hadoop setup right.
>> > > >>
>> > > >>
>> > > >> Apache Maven 3.1.2-SNAPSHOT
>> > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
>> > > >> Java home: /usr/java/jdk1.6.0_45/jre
>> > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch:
>> "amd64",
>> > > >> family: "unix"
>> > > >> $MAHOUT_LOCAL=true
>> > > >> Hadoop 2.2.0
>> > > >>
>> > > >>
>> > > >> a) Verify that u can unpack the release (tar or zip) ...passed
>> (tar)
>> > > >> [passed ]
>> > > >>
>> > > >> b) Verify u r able to compile the
>> > > distro
>> > > >>
>> > > >> mvn compile- [passed with warnings]
>> > > >>
>> > > >> [WARNING] Expected all dependencies to require Scala version:
>> 2.9.3
>> > > >> [WARNING] org.apache.mahout:mahout-math-scala:0.9 requires
>> scala
>> > > >> version: 2.9.3
>> > > >> [WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala
>> > > >> version: 2.9.2
>> > > >> [WARNING] Multiple versions of scala libraries detected!
>> > > >>
>> > > >> c) Run through the unit tests: mvn clean test
>> > > >> mvn clean test [passed]
>> > > >>
>> > > >> d) Run the
>> > > >> example scripts under $MAHOUT_HOME/examples/bin.
>> > > >> Please run through all the different options in each script
>> > > >>
>> > > >> Running example scripts with $MAHOUT_LOCAL=true
>> > > >>
>> > > >>
>> > > ./cluster-syntheticcontrol.sh ->1 [works]
>> > > >> ./cluster-syntheticcontrol.sh ->2 [works]
>> > > >> ./cluster-syntheticcontrol.sh ->3 [works]
>> > > >>
>> > > >>
>> > > >> ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
>> > > >> [...]
>> > > >> WARNING: Unable to add class:
>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>> > > >> java.lang.ClassNotFoundException:
>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>> > > >> at
>> > > >> java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>> > > >> at java.security.AccessController.doPrivileged(Native
>> Method)
>> > > >> at
>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>> > > >> at
>> > > java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>> > > >> at
>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>> > > >> at java.lang.Class.forName0(Native Method)
>> > > >> at java.lang.Class.forName(Class.java:171)
>> > > >> at
>> > > >>
>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>> > > >> at
>> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>> > > >> Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
>> > > >>
>> > > >>
>> > > >> ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
>> > > >>
>> > > >> WARNING: Unable to add class:
>> > > >>
>> > > org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>> > > >> java.lang.ClassNotFoundException:
>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>> > > >> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>> > > >> at java.security.AccessController.doPrivileged(Native
>> Method)
>> > > >> at
>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>> > > >> at
>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>> > > >> at java.lang.Class.forName0(Native Method)
>> > > >> at
>> > > java.lang.Class.forName(Class.java:171)
>> > > >> at
>> > > >>
>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>> > > >> at
>> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>> > > >> Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
>> > > >> WARNING: No
>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props
>> found
>> > > on
>> > > >> classpath, will use command-line arguments only
>> > > >> Unknown program
>> > > >> 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
>> chosen.
>> > > >>
>> > > >>
>> > > >> ./classify-20newsgroups.sh ->1 [works]
>> > > >> ./classify-20newsgroups.sh ->2 [works]
>> > > >>
>> > > >>
>> > > >> cluster-reuters.sh ->1 [works]
>> > > >>
>> > > cluster-reuters.sh ->2 [works]
>> > > >> cluster-reuters.sh ->3 [works]
>> > > >>
>> > > >> Same error as noted previosly in the thread:
>> > > >>
>> > > >> cluster-reuters.sh ->4 [0 clusters]
>> > > >>
>> > > >> [...]
>> > > >>
>> > > >> WARNING: No qualcluster.props found on classpath, will use
>> > > >> command-line arguments only
>> > > >> Num clusters: 0; maxDistance: 0.000000
>> > > >> [Dunn Index]
>> > > >> First: Infinity
>> > > >> [Davies-Bouldin Index] First: NaN
>> > > >> Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
>> > > >> INFO: Program took 669 ms (Minutes: 0.01115)
>> > > >> cluster,distance.mean,distance.sd
>> > > >>
>> > >
>> > >
>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
>> > > >> > From: suneel_marthi@yahoo.com
>> > > >> > Subject: MAHOUT 0.9 Release - New URL
>> > > >> > To: user@mahout.apache.org; dev@mahout.apache.org
>> > > >> >
>> > > >> > Third time's a Charm!!!
>> > > >> >
>> > > >> >
>> > > >> > Here's the new URL for Mahout 0.9 Release:
>> > > >> >
>> > > >>
>> > >
>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>> > > >> >
>> > > >> > For those volunteering to test this, some of the things to be
>> > > verified:
>> > > >> >
>> > > >> > a) Verify that u can unpack the release (tar or zip)
>> > > >> > b) Verify u r able to compile the distro
>> > > >> > c) Run through the unit tests: mvn clean test
>> > > >> > d) Run the example scripts
>> > > >> under $MAHOUT_HOME/examples/bin. Please run through all the
>> different
>> > > >> options in each script.
>> > > >> >
>> > > >> >
>> > > >> > Committers
>> > > >> > and PMC members:
>> > > >> > ---------------------------------------
>> > > >> >
>> > > >> > Need 'at least 3 +1 votes' for the Release to pass.
>> > > >> >
>> > > >> >
>> > > >> > Thanks and
>> > > Regards.
>> > > >>
>> > > >
>> > > >
>> > >
>>
>
>
Re: MAHOUT 0.9 Release - New URL
Posted by Andrew Musselman <an...@gmail.com>.
*classify-20newsgroups.sh*
*Complementary naive bayes:*
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 11207 98.9406%
Incorrectly Classified Instances : 120 1.0594%
Total Classified Instances : 11327
=======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j
k l m n o p q r s
t <--Classified as
475 0 0 1 0 0 0 0 0 0
0 0 0 0 1 0 1 0 0
0 | 478 a = alt.atheism
0 597 1 1 0 1 1 0 0 0
0 1 0 2 1 0 0 0 0
0 | 605 b = comp.graphics
0 1 620 3 0 1 0 0 0 0
0 1 0 0 1 0 0 0 0
0 | 627 c = comp.os.ms-windows.misc
1 1 1 593 2 0 0 0 0 0
0 0 0 0 0 1 0 0 0
0 | 599 d = comp.sys.ibm.pc.hardware
0 1 1 0 568 0 1 0 0 0
1 1 2 0 0 0 0 1 0
0 | 576 e = comp.sys.mac.hardware
0 4 2 0 0 581 0 0 0 0
0 0 0 0 0 0 0 0 0
0 | 587 f = comp.windows.x
0 0 0 1 2 0 571 3 0 0
1 1 4 1 0 0 0 0 0
0 | 584 g = misc.forsale
0 0 0 1 0 0 0 589 1 0
0 1 1 0 0 0 0 0 0
0 | 593 h = rec.autos
0 0 0 0 0 0 0 1 565 0
0 0 0 0 1 0 0 0 0
0 | 567 i = rec.motorcycles
0 0 0 0 0 0 0 0 0 600
2 0 0 0 1 0 0 0 0
0 | 603 j = rec.sport.baseball
0 0 0 0 0 0 0 0 0 1
584 0 0 0 0 0 0 0 0
0 | 585 k = rec.sport.hockey
0 0 0 0 0 0 0 0 0 0
0 579 0 0 0 0 0 1 0
0 | 580 l = sci.crypt
0 0 0 1 3 0 2 0 0 2
0 0 567 1 2 1 0 0 0
0 | 579 m = sci.electronics
0 0 0 0 0 0 0 0 0 0
0 0 1 605 0 0 0 0 0
0 | 606 n = sci.med
0 0 0 0 0 0 0 0 0 0
0 0 0 0 602 0 0 0 0
0 | 602 o = sci.space
0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 602 0 0 1
0 | 604 p = soc.religion.christian
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 556 0 0
0 | 556 q = talk.politics.mideast
0 0 1 0 0 0 0 0 0 0
0 1 0 0 1 0 0 568 0
0 | 571 r = talk.politics.guns
11 0 0 0 0 0 0 0 0 1
0 0 0 1 3 8 1 4 338
2 | 369 s = talk.religion.misc
0 0 0 0 0 0 0 0 0 0
1 0 0 0 1 0 3 4 0
447 | 456 t = talk.politics.misc
=======================================================
Statistics
-------------------------------------------------------
Kappa 0.9806
Accuracy 98.9406%
Reliability 94.0932%
Reliability (standard deviation) 0.2163
Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 15870 ms (Minutes: 0.2645)
+ echo 'Testing on holdout set'
Testing on holdout set
+ ./bin/mahout testnb -i /tmp/mahout-work-ec2-user/20news-test-vectors -m
/tmp/mahout-work-ec2-user/model -l /tmp/mahout-work-ec2-user/labelindex -ow
-o /tmp/mahout-work-ec2-user/20news-testing -c
[snip]
INFO: Complementary Results:
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 6715 89.3071%
Incorrectly Classified Instances : 804 10.6929%
Total Classified Instances : 7519
=======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j
k l m n o p q r s
t <--Classified as
298 0 0 0 0 0 0 0 0 1
0 0 0 1 2 5 1 0 13
0 | 321 a = alt.atheism
0 298 11 6 1 12 2 2 1 1
3 8 3 4 2 4 1 4 4
1 | 368 b = comp.graphics
1 17 286 16 4 9 6 3 2 0
1 0 1 7 1 0 2 1 0
1 | 358 c = comp.os.ms-windows.misc
2 6 11 309 9 5 14 8 1 0
2 0 6 4 2 0 1 2 1
0 | 383 d = comp.sys.ibm.pc.hardware
0 10 8 7 334 7 5 5 2 0
3 0 2 1 1 0 1 1 0
0 | 387 e = comp.sys.mac.hardware
1 13 7 8 2 355 2 0 2 0
0 5 1 1 3 0 0 1 0
0 | 401 f = comp.windows.x
0 7 11 29 12 9 268 16 8 4
3 2 6 4 2 1 3 1 2
3 | 391 g = misc.forsale
0 1 0 0 3 0 7 362 8 2
2 1 2 0 2 0 1 2 0
4 | 397 h = rec.autos
0 0 0 1 0 0 1 0 423 0
0 0 2 1 0 1 0 0 0
0 | 429 i = rec.motorcycles
0 0 1 0 0 0 0 2 2 371
8 0 2 3 0 2 0 0 0
0 | 391 j = rec.sport.baseball
0 0 1 0 0 0 1 0 0 2
409 0 0 0 0 0 0 0 0
1 | 414 k = rec.sport.hockey
0 0 1 2 1 0 1 0 0 0
0 404 0 0 0 0 0 1 0
1 | 411 l = sci.crypt
0 5 4 11 1 3 7 9 2 5
3 3 339 2 6 0 1 1 2
1 | 405 m = sci.electronics
0 4 0 1 0 0 0 1 0 1
1 0 3 367 3 1 2 0 0
0 | 384 n = sci.med
0 1 2 0 1 0 2 0 0 1
0 0 1 1 375 0 1 0 0
0 | 385 o = sci.space
4 2 1 1 0 0 1 1 2 0
0 1 1 5 1 367 4 0 1
1 | 393 p = soc.religion.christian
0 1 0 0 0 0 0 0 0 2
0 0 0 0 0 2 378 0 1
0 | 384 q = talk.politics.mideast
0 0 0 0 0 2 1 1 1 1
0 3 0 3 0 0 2 319 2
4 | 339 r = talk.politics.guns
32 0 0 1 0 0 0 0 0 1
1 1 0 2 2 26 5 7 175
6 | 259 s = talk.religion.misc
0 0 0 2 0 0 0 0 0 1
2 2 0 1 2 1 10 18 2
278 | 319 t = talk.politics.misc
=======================================================
Statistics
-------------------------------------------------------
Kappa 0.8594
Accuracy 89.3071%
Reliability 84.611%
Reliability (standard deviation) 0.2148
Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 10840 ms (Minutes: 0.18066666666666667)
*Naive bayes:*
INFO: Standard NB Results:
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 11286 99.0869%
Incorrectly Classified Instances : 104 0.9131%
Total Classified Instances : 11390
=======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j
k l m n o p q r s
t <--Classified as
474 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 2
1 | 477 a = alt.atheism
0 566 0 2 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0
0 | 569 b = comp.graphics
0 10 590 29 2 4 1 0 0 0
0 0 1 0 0 0 0 0 0
1 | 638 c = comp.os.ms-windows.misc
0 0 0 596 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 | 596 d = comp.sys.ibm.pc.hardware
0 0 0 0 575 0 1 0 0 0
0 0 1 0 0 0 0 0 0
0 | 577 e = comp.sys.mac.hardware
0 2 2 2 0 593 1 0 0 0
0 0 0 0 1 0 0 0 0
0 | 601 f = comp.windows.x
0 0 0 1 0 0 589 1 0 0
1 0 2 0 0 0 0 0 0
0 | 594 g = misc.forsale
0 0 0 0 0 0 0 594 0 0
0 0 0 0 0 0 0 0 0
0 | 594 h = rec.autos
0 0 0 0 0 0 0 0 611 0
0 0 0 0 0 0 0 0 0
0 | 611 i = rec.motorcycles
0 0 0 0 0 0 0 0 0 616
1 0 0 0 0 0 0 0 0
0 | 617 j = rec.sport.baseball
0 0 0 0 0 0 1 0 0 0
620 0 0 0 0 0 0 0 0
0 | 621 k = rec.sport.hockey
0 0 0 0 0 0 0 0 0 0
0 580 0 0 0 0 0 1 0
0 | 581 l = sci.crypt
0 0 0 3 1 0 0 0 0 0
0 0 571 0 0 0 0 0 0
0 | 575 m = sci.electronics
0 0 0 0 0 0 0 0 0 0
0 0 2 583 0 0 0 0 0
0 | 585 n = sci.med
0 0 0 0 0 0 0 0 0 0
0 0 0 1 599 0 0 0 0
0 | 600 o = sci.space
0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 615 0 0 0
0 | 616 p = soc.religion.christian
1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 560 0 0
0 | 562 q = talk.politics.mideast
0 0 1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 548 0
1 | 551 r = talk.politics.guns
10 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 0 2 344
1 | 359 s = talk.religion.misc
0 0 0 0 0 0 0 0 0 0
0 1 1 0 0 0 0 2 0
462 | 466 t = talk.politics.misc
=======================================================
Statistics
-------------------------------------------------------
Kappa 0.9847
Accuracy 99.0869%
Reliability 94.3334%
Reliability (standard deviation) 0.2169
Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 14304 ms (Minutes: 0.2384)
+ echo 'Testing on holdout set'
Testing on holdout set
[snip]
INFO: Standard NB Results:
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 6718 90.1019%
Incorrectly Classified Instances : 738 9.8981%
Total Classified Instances : 7456
=======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j
k l m n o p q r s
t <--Classified as
294 0 0 0 0 0 0 0 0 0
0 2 0 1 1 6 1 1 16
0 | 322 a = alt.atheism
0 345 6 14 6 11 6 0 0 0
0 5 7 1 3 0 0 0 0
0 | 404 b = comp.graphics
2 29 177 78 22 19 9 1 0 0
0 4 2 0 1 1 0 0 1
1 | 347 c = comp.os.ms-windows.misc
1 9 2 335 18 2 10 0 0 0
1 0 8 0 0 0 0 0 0
0 | 386 d = comp.sys.ibm.pc.hardware
1 4 2 13 347 3 5 1 0 0
1 0 7 1 0 0 0 1 0
0 | 386 e = comp.sys.mac.hardware
0 20 0 4 0 352 4 0 0 0
0 0 1 1 3 0 1 0 1
0 | 387 f = comp.windows.x
0 2 0 21 5 1 323 7 2 2
0 2 12 0 3 0 0 0 0
1 | 381 g = misc.forsale
0 1 0 0 1 0 15 363 8 1
0 0 4 1 0 0 0 1 0
1 | 396 h = rec.autos
0 1 0 0 0 0 6 6 370 0
0 0 0 1 0 0 0 0 1
0 | 385 i = rec.motorcycles
1 0 0 1 1 0 2 1 2 362
5 0 2 0 0 0 0 0 0
0 | 377 j = rec.sport.baseball
0 0 0 1 2 0 0 0 0 3
371 0 0 0 0 0 0 0 0
1 | 378 k = rec.sport.hockey
0 3 1 0 1 0 2 0 0 0
0 396 0 1 0 0 1 1 1
3 | 410 l = sci.crypt
0 7 0 7 7 2 6 4 0 0
0 1 369 2 2 0 0 0 0
2 | 409 m = sci.electronics
0 3 0 2 1 0 2 0 0 0
0 1 4 383 4 0 0 1 0
4 | 405 n = sci.med
0 5 0 0 1 0 3 0 0 0
0 0 1 0 374 1 0 0 1
1 | 387 o = sci.space
6 2 0 1 1 0 0 1 0 1
0 0 1 5 0 352 2 1 7
1 | 381 p = soc.religion.christian
1 1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 373 1 0
1 | 378 q = talk.politics.mideast
0 0 0 0 0 0 1 0 1 0
0 2 0 0 0 0 0 346 2
7 | 359 r = talk.politics.guns
26 1 0 1 0 0 0 2 0 1
1 0 0 1 1 20 2 6 200
7 | 269 s = talk.religion.misc
1 0 0 0 0 0 0 2 0 0
1 0 0 2 2 0 1 14 0
286 | 309 t = talk.politics.misc
=======================================================
Statistics
-------------------------------------------------------
Kappa 0.8726
Accuracy 90.1019%
Reliability 85.4491%
Reliability (standard deviation) 0.2222
Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 10878 ms (Minutes: 0.1813)
*SGD:*
7532 test files
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 5649 75%
Incorrectly Classified Instances : 1883 25%
Total Classified Instances : 7532
=======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j
k l m n o p q r s
t <--Classified as
186 6 3 10 5 0 33 4 13 15
7 1 24 15 3 15 5 5 29
15 | 394 a = sci.space
5 309 0 3 2 5 0 0 0 1
9 21 2 0 0 18 4 4 1
1 | 385 b = comp.sys.mac.hardware
4 1 101 3 0 1 63 0 7 0
1 1 5 16 3 0 3 7 1
34 | 251 c = talk.religion.misc
11 12 1 265 1 10 3 0 0 17
10 11 5 2 0 11 3 6 21
0 | 389 d = comp.graphics
2 1 1 0 349 2 3 0 3 2
6 1 5 1 0 2 15 2 1
2 | 398 e = rec.motorcycles
7 20 3 19 2 254 6 0 2 11
2 39 7 2 0 4 2 2 9
3 | 394 f = comp.os.ms-windows.misc
2 1 13 0 0 0 247 0 1 1
3 0 6 2 4 0 2 3 5
29 | 319 g = alt.atheism
1 1 0 0 2 0 2 361 0 1
2 0 2 0 0 1 3 22 0
1 | 399 h = rec.sport.hockey
3 0 3 1 0 0 5 0 161 0
1 2 12 102 0 0 1 2 11
6 | 310 i = talk.politics.misc
2 8 0 19 0 19 0 0 1 294
10 11 4 2 0 5 0 3 11
6 | 395 j = comp.windows.x
2 10 0 1 1 0 0 0 0 1
347 13 2 1 0 5 3 2 2
0 | 390 k = misc.forsale
1 36 0 6 1 25 0 0 1 6
10 257 2 1 0 34 6 0 6
0 | 392 l = comp.sys.ibm.pc.hardware
2 2 2 2 1 0 12 0 0 6
10 4 312 5 2 13 11 3 3
6 | 396 m = sci.med
2 0 3 2 1 0 0 1 13 0
5 1 2 314 2 0 2 2 10
4 | 364 n = talk.politics.guns
1 0 2 1 1 0 34 1 33 1
3 0 1 8 271 1 4 5 6
3 | 376 o = talk.politics.mideast
3 14 0 8 2 8 3 1 1 7
12 29 6 2 1 245 13 2 32
4 | 393 p = sci.electronics
3 3 0 2 11 0 1 0 2 1
11 6 4 2 0 11 330 4 4
1 | 396 q = rec.autos
0 0 1 0 1 0 4 12 3 1
3 0 0 0 0 5 6 359 1
1 | 397 r = rec.sport.baseball
0 1 0 0 0 1 0 0 3 3
0 0 3 2 1 6 1 6 366
3 | 396 s = sci.crypt
0 2 11 1 1 0 40 0 1 2
3 4 2 1 0 5 0 2 2
321 | 398 t = soc.religion.christian
=======================================================
Statistics
-------------------------------------------------------
Kappa 0.7073
Accuracy 75%
Reliability 70.6238%
Reliability (standard deviation) 0.2187
Log-likelihood mean : -1.1182
25%-ile : -1.6911
75%-ile : -0.0803
Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 10783 ms (Minutes: 0.17971666666666666)
On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <su...@yahoo.com>wrote:
> Thanks Andrew for reporting that. I rolled back the release to fix this
> and few other issues.
>
> We have removed asf-examples*.sh from trunk as the sample file at the url
> mentioned in ur email is not available.
> This is something we need to fix and restore in 1.0.
>
>
>
>
>
>
>
> On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <ap...@outlook.com>
> wrote:
>
> from the asf-email-examples.sh script:
>
> # You will need to download or otherwise obtain some or all of the Amazon
> ASF Em
> ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566) to
> use this
> script.
> # To obtain a full copy you will need to launch an EC2 instance and mount
> the da
> taset to download it, otherwise you can get a sample of it at
> #
> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>
> It looks like the:
> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>
> link is down.
>
> Is there somewhere else that we can get a subset of the ASF emails?
>
>
>
> Date: Tue, 21 Jan 2014 09:48:06 -0800
> > Subject: Re: MAHOUT 0.9 Release - New URL
> > From: andrew.musselman@gmail.com
> > To: dev@mahout.apache.org
> >
> > Sure thing; continuing to smoke test the other examples tonight
> >
> >
> > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <suneel_marthi@yahoo.com
> >wrote:
> >
> > > Thanks Andrew M., see that some of the example scripts need to be
> fixed as
> > > they still refer to the deprecated algorithms.
> > > See that the Streaming KMeans has failed for you as well.
> > >
> > > I'll be rolling back the release today to fix these issues.
> > >
> > >
> > >
> > >
> > >
> > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
> > > andrew.musselman@gmail.com> wrote:
> > >
> > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default
> 64-bit
> > > Linux AMI from tarball.
> > >
> > > All tests pass.
> > >
> > > *Output of examples:*
> > > *asf-email-examples.sh, run on mahout.apache.org
> > > <http://mahout.apache.org>:*
> > > *recommendations:*
> > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
> > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000 | less
> > > 1
> > >
> > >
> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
> > > 4
> > >
> > >
> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
> > > 6
> > >
> > >
> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
> > > 8
> > > [12758:1.0,19409:1.0,11112:1.0]
> > > 11
> > >
> > >
> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
> > > 14
> > >
> > >
> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
> > > 15
> > >
> > >
> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
> > > 16
> > >
> > >
> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
> > > 18
> > >
> > >
> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
> > > 19 [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
> > > 20
> > >
> > >
> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
> > > [snip]
> > >
> > > *clustering; kmeans:*
> > > [snip]
> > > Weight : [props - optional]: Point:
> > > 1.0 :
> > > [distance-squared=1.0193102046188427]:
> > > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus =
> > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
> 7573:0.204,
> > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093, 9779:0.159,
> > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
> > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
> > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
> > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
> > > 39789:0.110, 40743:0.190, 45775:0.086]
> > > 1.0 : [distance-squared=0.9823018320457279]:
> > > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus =
> > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
> 5336:0.106,
> > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173, 7832:0.072,
> > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
> > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
> > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
> > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
> > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
> > > 1.0 : [distance-squared=0.9509142993214911]:
> > > /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor=
> > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
> > > 4419:0.076,
> > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056, 7235:0.048,
> > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123, 7683:0.077,
> > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
> 10225:0.081,
> > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
> > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
> > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
> > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
> > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
> > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
> > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
> > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
> > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
> > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
> > > 41280:0.065, 41696:0.072, 41947:0.118,
> > > 43685:0.086, 44077:0.308,
> > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
> > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
> > > [snip]
> > >
> > > *clustering; dirichlet:*
> > > Get this complaint:
> > > Running Dirichlet with K = 8
> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > > HADOOP_CONF_DIR=
> > > MAHOUT-JOB:
> > >
> > >
> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class:
> dirichlet
> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props found on
> > > classpath, will use command-line arguments only
> > > Unknown program 'dirichlet' chosen.
> > >
> > > *clustering: minhash:*
> > > Running Minhash
> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > > HADOOP_CONF_DIR=
> > > MAHOUT-JOB:
> > >
> > >
> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > > 14/01/21 05:17:27 WARN
> > > driver.MahoutDriver: Unable to add class: minhash
> > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found on
> > > classpath, will use command-line arguments only
> > > Unknown program 'minhash' chosen.
> > >
> > > *classification; standard:*
> > > =======================================================
> > > Summary
> > > -------------------------------------------------------
> > > Correctly Classified Instances : 5384 87.7874%
> > > Incorrectly Classified Instances : 749 12.2126%
> > > Total Classified Instances : 6133
> > >
> > > =======================================================
> > > Confusion Matrix
> > > -------------------------------------------------------
> > > a b c d
> > > <--Classified as
> > > 2949 7 531 25 | 3512 a = dev
> > > 0 0 0 0 | 0 b = general
> > > 99 8 1763 8 | 1878 c = user
> > > 41 1 29 672 | 743 d = commits
> > >
> > > =======================================================
> > > Statistics
> > > -------------------------------------------------------
> > > Kappa
> > > 0.7877
> > > Accuracy 87.7874%
> > > Reliability 53.658%
> > > Reliability (standard deviation) 0.4911
> > >
> > > *classification; complementary:*
> > > =======================================================
> > > Summary
> > > -------------------------------------------------------
> > > Correctly Classified Instances : 5530 90.1679%
> > > Incorrectly Classified Instances : 603 9.8321%
> > > Total Classified Instances :
> > > 6133
> > >
> > > =======================================================
> > > Confusion Matrix
> > > -------------------------------------------------------
> > > a b c d <--Classified as
> > > 3168 0 276 68 | 3512 a = dev
> > > 0 0 0 0 | 0 b = general
> > > 196 0 1652 30 | 1878 c = user
> > > 25 0 8 710 | 743 d =
> > > commits
> > >
> > > =======================================================
> > > Statistics
> > > -------------------------------------------------------
> > > Kappa 0.8259
> > > Accuracy 90.1679%
> > > Reliability 54.7459%
> > > Reliability (standard deviation) 0.5005
> > >
> > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms
> (Minutes:
> > > 0.34836666666666666)
> > >
> > > *classification; sgd, with three categories:*
> > > Running SGD Training
> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
> > > and
> > > HADOOP_CONF_DIR=
> > > MAHOUT-JOB:
> > >
> > >
> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
> > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
> classpath,
> > > will use command-line arguments only
> > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
> > > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
> > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
> > > --output=[asf-output/classification/sgd/models], --poolSize=[5],
> > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
> > > 24168 training files
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 1
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> > > 2
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 3
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 4
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 6
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 8
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 10
> > > 0.000 0.00 none
> > > 0.00 0.00
> > > 0.00 0.00 0.0000000 0.0000000 12
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 15
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 20
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 25
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 30
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000
> > > 0.0000000 40
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 50
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 60
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 70
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 80
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 100
> > > 0.000
> > > 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 120
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 140
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 150
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 200
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 250
> > > 0.000 0.00 none
> > > 0.00 0.00
> > > 0.00 0.00 0.0000000 0.0000000 300
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 400
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 500
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 600
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 700
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000
> > > 0.0000000 800
> > > 0.000 0.00 none
> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > > 1.0019413e-08 1000 -0.607 75.78 none
> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > > 1.0019413e-08 1200 -0.607 75.78 none
> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > > 1.0019413e-08 1400 -0.607 75.78 none
> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > > 1.0019413e-08 1500 -0.607 75.78 none
> > > 0.24 43686.00 17924.00 329.50
> > > 1.0571799e-08
> > > 1.0032261e-08 2000 -0.487 82.65 none
> > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
> > > 1.0011902e-08 2500 -0.439 83.90 none
> > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
> > > 1.0011902e-08 3000 -0.439 83.90 none
> > > 0.32 50635.00 28531.00 437.09 1.0551175e-08
> > > 1.0000001e-08 4000 -0.351 88.14 none
> > > 0.32 50635.00 32642.00 437.09 1.0551175e-08
> > > 1.0000000e-08 5000 -0.378 87.10 none
> > > 0.32 50635.00 36461.00 437.09
> > > 1.0556652e-08
> > > 1.0000001e-08 6000 -0.372 86.89 none
> > > 0.32 50635.00 37768.00 437.09 1.0576742e-08
> > > 1.0000001e-08 7000 -0.334 89.26 none
> > > 0.32 50635.00 38807.00 437.09 1.0576742e-08
> > > 1.0000000e-08 8000 -0.368 87.52 none
> > > 0.32 50635.00 44731.00 437.09 1.0576716e-08
> > > 1.0000000e-08 10000 -0.374 87.39 none
> > > 0.32 50635.00 45672.00 437.09 1.0576716e-08
> > > 1.0000000e-08 12000 -0.298 88.26 none
> > > Exception in thread "main" java.lang.IllegalStateException:
> > > java.lang.ArrayIndexOutOfBoundsException:
> > > 2
> > > at
> > >
> > >
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
> > > at
> > >
> > >
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
> > > at
> > >
> > >
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
> > > at
> > >
> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
> > > at
> > >
> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > at
> > >
> > >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > >
> > > at
> > >
> > >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > at java.lang.reflect.Method.invoke(Method.java:622)
> > > at
> > >
> > >
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > > at
> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > > at
> > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > at
> > >
> > >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > > at
> > >
> > >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > at java.lang.reflect.Method.invoke(Method.java:622)
> > > at
> > > org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
> > > at
> > > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
> > > at
> > >
> > >
> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
> > > at
> > >
> > >
> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
> > > at
> > >
> > >
> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
> > > at
> > >
> > >
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
> > > at
> > >
> > >
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
> > >
> > > at
> > >
> > >
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
> > > at
> > >
> > >
> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
> > > at
> > >
> > >
> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
> > > at
> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > at
> > >
> > >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > > at
> > >
> > >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > at java.lang.Thread.run(Thread.java:701)
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
> > > andrew.musselman@gmail.com> wrote:
> > >
> > > > Trying out the build today
> > > >
> > > >
> > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
> suneel_marthi@yahoo.com
> > > >wrote:
> > > >
> > > >> This is an issue (trivial one though) that needs to be fixed for 0.9
> > > >> Release, will be rerolling the release today (in the next few hrs)
> and
> > > >> putting out a new release candidate in staging.
> > > >>
> > > >> Thanks for reporting this Andrew P.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
> > > ap.dev@outlook.com>
> > > >> wrote:
> > > >>
> > > >> I ran through the tests with on a CentOS VM
> > > AMD64 2 cores 4 GB RAM. Had
> > > >> a bit of trouble getting the Hadoop natives to compile and
> therefore may
> > > >> have run into some problems because of the hadoop setup. Ran into
> some
> > > >> problems in the example scripts. Particularly with
> > > >> ./cluster-syntheticcontrol.sh ->4,5. I will run through the rest
> of the
> > > >> examples when im sure I've got hadoop setup right.
> > > >>
> > > >>
> > > >> Apache Maven 3.1.2-SNAPSHOT
> > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> > > >> Java home: /usr/java/jdk1.6.0_45/jre
> > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch:
> "amd64",
> > > >> family: "unix"
> > > >> $MAHOUT_LOCAL=true
> > > >> Hadoop 2.2.0
> > > >>
> > > >>
> > > >> a) Verify that u can unpack the release (tar or zip) ...passed (tar)
> > > >> [passed ]
> > > >>
> > > >> b) Verify u r able to compile the
> > > distro
> > > >>
> > > >> mvn compile- [passed with warnings]
> > > >>
> > > >> [WARNING] Expected all dependencies to require Scala version:
> 2.9.3
> > > >> [WARNING] org.apache.mahout:mahout-math-scala:0.9 requires
> scala
> > > >> version: 2.9.3
> > > >> [WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala
> > > >> version: 2.9.2
> > > >> [WARNING] Multiple versions of scala libraries detected!
> > > >>
> > > >> c) Run through the unit tests: mvn clean test
> > > >> mvn clean test [passed]
> > > >>
> > > >> d) Run the
> > > >> example scripts under $MAHOUT_HOME/examples/bin.
> > > >> Please run through all the different options in each script
> > > >>
> > > >> Running example scripts with $MAHOUT_LOCAL=true
> > > >>
> > > >>
> > > ./cluster-syntheticcontrol.sh ->1 [works]
> > > >> ./cluster-syntheticcontrol.sh ->2 [works]
> > > >> ./cluster-syntheticcontrol.sh ->3 [works]
> > > >>
> > > >>
> > > >> ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
> > > >> [...]
> > > >> WARNING: Unable to add class:
> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > > >> java.lang.ClassNotFoundException:
> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > > >> at
> > > >> java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > > >> at java.security.AccessController.doPrivileged(Native
> Method)
> > > >> at
> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > > >> at
> > > java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > > >> at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > > >> at java.lang.Class.forName0(Native Method)
> > > >> at java.lang.Class.forName(Class.java:171)
> > > >> at
> > > >>
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > > >> at
> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > > >> Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
> > > >>
> > > >>
> > > >> ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
> > > >>
> > > >> WARNING: Unable to add class:
> > > >>
> > > org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > > >> java.lang.ClassNotFoundException:
> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > > >> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > > >> at java.security.AccessController.doPrivileged(Native
> Method)
> > > >> at
> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > > >> at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > > >> at java.lang.Class.forName0(Native Method)
> > > >> at
> > > java.lang.Class.forName(Class.java:171)
> > > >> at
> > > >>
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > > >> at
> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > > >> Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
> > > >> WARNING: No
> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props
> found
> > > on
> > > >> classpath, will use command-line arguments only
> > > >> Unknown program
> > > >> 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
> chosen.
> > > >>
> > > >>
> > > >> ./classify-20newsgroups.sh ->1 [works]
> > > >> ./classify-20newsgroups.sh ->2 [works]
> > > >>
> > > >>
> > > >> cluster-reuters.sh ->1 [works]
> > > >>
> > > cluster-reuters.sh ->2 [works]
> > > >> cluster-reuters.sh ->3 [works]
> > > >>
> > > >> Same error as noted previosly in the thread:
> > > >>
> > > >> cluster-reuters.sh ->4 [0 clusters]
> > > >>
> > > >> [...]
> > > >>
> > > >> WARNING: No qualcluster.props found on classpath, will use
> > > >> command-line arguments only
> > > >> Num clusters: 0; maxDistance: 0.000000
> > > >> [Dunn Index]
> > > >> First: Infinity
> > > >> [Davies-Bouldin Index] First: NaN
> > > >> Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
> > > >> INFO: Program took 669 ms (Minutes: 0.01115)
> > > >> cluster,distance.mean,distance.sd
> > > >>
> > >
> > >
> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
> > > >> > From: suneel_marthi@yahoo.com
> > > >> > Subject: MAHOUT 0.9 Release - New URL
> > > >> > To: user@mahout.apache.org; dev@mahout.apache.org
> > > >> >
> > > >> > Third time's a Charm!!!
> > > >> >
> > > >> >
> > > >> > Here's the new URL for Mahout 0.9 Release:
> > > >> >
> > > >>
> > >
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> > > >> >
> > > >> > For those volunteering to test this, some of the things to be
> > > verified:
> > > >> >
> > > >> > a) Verify that u can unpack the release (tar or zip)
> > > >> > b) Verify u r able to compile the distro
> > > >> > c) Run through the unit tests: mvn clean test
> > > >> > d) Run the example scripts
> > > >> under $MAHOUT_HOME/examples/bin. Please run through all the
> different
> > > >> options in each script.
> > > >> >
> > > >> >
> > > >> > Committers
> > > >> > and PMC members:
> > > >> > ---------------------------------------
> > > >> >
> > > >> > Need 'at least 3 +1 votes' for the Release to pass.
> > > >> >
> > > >> >
> > > >> > Thanks and
> > > Regards.
> > > >>
> > > >
> > > >
> > >
>
Re: MAHOUT 0.9 Release - New URL
Posted by Suneel Marthi <su...@yahoo.com>.
Thanks Andrew for reporting that. I rolled back the release to fix this and few other issues.
We have removed asf-examples*.sh from trunk as the sample file at the url mentioned in ur email is not available.
This is something we need to fix and restore in 1.0.
On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <ap...@outlook.com> wrote:
from the asf-email-examples.sh script:
# You will need to download or otherwise obtain some or all of the Amazon ASF Em
ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566) to use this
script.
# To obtain a full copy you will need to launch an EC2 instance and mount the da
taset to download it, otherwise you can get a sample of it at
# http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
It looks like the:
http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
link is down.
Is there somewhere else that we can get a subset of the ASF emails?
Date: Tue, 21 Jan 2014 09:48:06 -0800
> Subject: Re: MAHOUT 0.9 Release - New URL
> From: andrew.musselman@gmail.com
> To: dev@mahout.apache.org
>
> Sure thing; continuing to smoke test the other examples tonight
>
>
> On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <su...@yahoo.com>wrote:
>
> > Thanks Andrew M., see that some of the example scripts need to be fixed as
> > they still refer to the deprecated algorithms.
> > See that the Streaming KMeans has failed for you as well.
> >
> > I'll be rolling back the release today to fix these issues.
> >
> >
> >
> >
> >
> > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
> > andrew.musselman@gmail.com> wrote:
> >
> > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default 64-bit
> > Linux AMI from tarball.
> >
> > All tests pass.
> >
> > *Output of examples:*
> > *asf-email-examples.sh, run on mahout.apache.org
> > <http://mahout.apache.org>:*
> > *recommendations:*
> > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
> > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000 | less
> > 1
> >
> > [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
> > 4
> >
> > [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
> > 6
> >
> > [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
> > 8
> > [12758:1.0,19409:1.0,11112:1.0]
> > 11
> >
> > [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
> > 14
> >
> > [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
> > 15
> >
> > [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
> > 16
> >
> > [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
> > 18
> >
> > [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
> > 19 [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
> > 20
> >
> > [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
> > [snip]
> >
> > *clustering; kmeans:*
> > [snip]
> > Weight : [props - optional]: Point:
> > 1.0 :
> > [distance-squared=1.0193102046188427]:
> > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus =
> > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110, 7573:0.204,
> > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093, 9779:0.159,
> > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
> > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
> > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
> > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
> > 39789:0.110, 40743:0.190, 45775:0.086]
> > 1.0 : [distance-squared=0.9823018320457279]:
> > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus =
> > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104, 5336:0.106,
> > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173, 7832:0.072,
> > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
> > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
> > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
> > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
> > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
> > 1.0 : [distance-squared=0.9509142993214911]:
> > /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
> > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
> > 4419:0.076,
> > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056, 7235:0.048,
> > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123, 7683:0.077,
> > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118, 10225:0.081,
> > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
> > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
> > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
> > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
> > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
> > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
> > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
> > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
> > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
> > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
> > 41280:0.065, 41696:0.072, 41947:0.118,
> > 43685:0.086, 44077:0.308,
> > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
> > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
> > [snip]
> >
> > *clustering; dirichlet:*
> > Get this complaint:
> > Running Dirichlet with K = 8
> > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > HADOOP_CONF_DIR=
> > MAHOUT-JOB:
> >
> > /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class: dirichlet
> > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props found on
> > classpath, will use command-line arguments only
> > Unknown program 'dirichlet' chosen.
> >
> > *clustering: minhash:*
> > Running Minhash
> > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > HADOOP_CONF_DIR=
> > MAHOUT-JOB:
> >
> > /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > 14/01/21 05:17:27 WARN
> > driver.MahoutDriver: Unable to add class: minhash
> > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found on
> > classpath, will use command-line arguments only
> > Unknown program 'minhash' chosen.
> >
> > *classification; standard:*
> > =======================================================
> > Summary
> > -------------------------------------------------------
> > Correctly Classified Instances : 5384 87.7874%
> > Incorrectly Classified Instances : 749 12.2126%
> > Total Classified Instances : 6133
> >
> > =======================================================
> > Confusion Matrix
> > -------------------------------------------------------
> > a b c d
> > <--Classified as
> > 2949 7 531 25 | 3512 a = dev
> > 0 0 0 0 | 0 b = general
> > 99 8 1763 8 | 1878 c = user
> > 41 1 29 672 | 743 d = commits
> >
> > =======================================================
> > Statistics
> > -------------------------------------------------------
> > Kappa
> > 0.7877
> > Accuracy 87.7874%
> > Reliability 53.658%
> > Reliability (standard deviation) 0.4911
> >
> > *classification; complementary:*
> > =======================================================
> > Summary
> > -------------------------------------------------------
> > Correctly Classified Instances : 5530 90.1679%
> > Incorrectly Classified Instances : 603 9.8321%
> > Total Classified Instances :
> > 6133
> >
> > =======================================================
> > Confusion Matrix
> > -------------------------------------------------------
> > a b c d <--Classified as
> > 3168 0 276 68 | 3512 a = dev
> > 0 0 0 0 | 0 b = general
> > 196 0 1652 30 | 1878 c = user
> > 25 0 8 710 | 743 d =
> > commits
> >
> > =======================================================
> > Statistics
> > -------------------------------------------------------
> > Kappa 0.8259
> > Accuracy 90.1679%
> > Reliability 54.7459%
> > Reliability (standard deviation) 0.5005
> >
> > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms (Minutes:
> > 0.34836666666666666)
> >
> > *classification; sgd, with three categories:*
> > Running SGD Training
> > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
> > and
> > HADOOP_CONF_DIR=
> > MAHOUT-JOB:
> >
> > /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
> > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on classpath,
> > will use command-line arguments only
> > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
> > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
> > --input=[asf-output/classification/sgd/splits/mapRedOut/],
> > --output=[asf-output/classification/sgd/models], --poolSize=[5],
> > --startPhase=[0], --tempDir=[temp], --threads=[20]}
> > 24168 training files
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 1
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> > 2
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 3
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 4
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 6
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 8
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 10
> > 0.000 0.00 none
> > 0.00 0.00
> > 0.00 0.00 0.0000000 0.0000000 12
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 15
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 20
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 25
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 30
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000
> > 0.0000000 40
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 50
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 60
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 70
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 80
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 100
> > 0.000
> > 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 120
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 140
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 150
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 200
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 250
> > 0.000 0.00 none
> > 0.00 0.00
> > 0.00 0.00 0.0000000 0.0000000 300
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 400
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 500
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 600
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 700
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000
> > 0.0000000 800
> > 0.000 0.00 none
> > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > 1.0019413e-08 1000 -0.607 75.78 none
> > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > 1.0019413e-08 1200 -0.607 75.78 none
> > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > 1.0019413e-08 1400 -0.607 75.78 none
> > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > 1.0019413e-08 1500 -0.607 75.78 none
> > 0.24 43686.00 17924.00 329.50
> > 1.0571799e-08
> > 1.0032261e-08 2000 -0.487 82.65 none
> > 0.24 49753.00 21610.00 330.71 1.3770070e-08
> > 1.0011902e-08 2500 -0.439 83.90 none
> > 0.24 49753.00 21610.00 330.71 1.3770070e-08
> > 1.0011902e-08 3000 -0.439 83.90 none
> > 0.32 50635.00 28531.00 437.09 1.0551175e-08
> > 1.0000001e-08 4000 -0.351 88.14 none
> > 0.32 50635.00 32642.00 437.09 1.0551175e-08
> > 1.0000000e-08 5000 -0.378 87.10 none
> > 0.32 50635.00 36461.00 437.09
> > 1.0556652e-08
> > 1.0000001e-08 6000 -0.372 86.89 none
> > 0.32 50635.00 37768.00 437.09 1.0576742e-08
> > 1.0000001e-08 7000 -0.334 89.26 none
> > 0.32 50635.00 38807.00 437.09 1.0576742e-08
> > 1.0000000e-08 8000 -0.368 87.52 none
> > 0.32 50635.00 44731.00 437.09 1.0576716e-08
> > 1.0000000e-08 10000 -0.374 87.39 none
> > 0.32 50635.00 45672.00 437.09 1.0576716e-08
> > 1.0000000e-08 12000 -0.298 88.26 none
> > Exception in thread "main" java.lang.IllegalStateException:
> > java.lang.ArrayIndexOutOfBoundsException:
> > 2
> > at
> >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
> > at
> >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
> > at
> >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
> > at
> > org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
> > at
> > org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >
> > at
> >
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:622)
> > at
> >
> > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > at
> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > at
> > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > at
> >
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:622)
> > at
> > org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
> > at
> > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
> > at
> >
> > org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
> > at
> >
> > org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
> > at
> >
> > org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
> > at
> >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
> > at
> >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
> >
> > at
> >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
> > at
> >
> > org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
> > at
> >
> > org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
> > at
> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > at
> >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > at
> >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:701)
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
> > andrew.musselman@gmail.com> wrote:
> >
> > > Trying out the build today
> > >
> > >
> > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <suneel_marthi@yahoo.com
> > >wrote:
> > >
> > >> This is an issue (trivial one though) that needs to be fixed for 0.9
> > >> Release, will be rerolling the release today (in the next few hrs) and
> > >> putting out a new release candidate in staging.
> > >>
> > >> Thanks for reporting this Andrew P.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
> > ap.dev@outlook.com>
> > >> wrote:
> > >>
> > >> I ran through the tests with on a CentOS VM
> > AMD64 2 cores 4 GB RAM. Had
> > >> a bit of trouble getting the Hadoop natives to compile and therefore may
> > >> have run into some problems because of the hadoop setup. Ran into some
> > >> problems in the example scripts. Particularly with
> > >> ./cluster-syntheticcontrol.sh ->4,5. I will run through the rest of the
> > >> examples when im sure I've got hadoop setup right.
> > >>
> > >>
> > >> Apache Maven 3.1.2-SNAPSHOT
> > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> > >> Java home: /usr/java/jdk1.6.0_45/jre
> > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64",
> > >> family: "unix"
> > >> $MAHOUT_LOCAL=true
> > >> Hadoop 2.2.0
> > >>
> > >>
> > >> a) Verify that u can unpack the release (tar or zip) ...passed (tar)
> > >> [passed ]
> > >>
> > >> b) Verify u r able to compile the
> > distro
> > >>
> > >> mvn compile- [passed with warnings]
> > >>
> > >> [WARNING] Expected all dependencies to require Scala version: 2.9.3
> > >> [WARNING] org.apache.mahout:mahout-math-scala:0.9 requires scala
> > >> version: 2.9.3
> > >> [WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala
> > >> version: 2.9.2
> > >> [WARNING] Multiple versions of scala libraries detected!
> > >>
> > >> c) Run through the unit tests: mvn clean test
> > >> mvn clean test [passed]
> > >>
> > >> d) Run the
> > >> example scripts under $MAHOUT_HOME/examples/bin.
> > >> Please run through all the different options in each script
> > >>
> > >> Running example scripts with $MAHOUT_LOCAL=true
> > >>
> > >>
> > ./cluster-syntheticcontrol.sh ->1 [works]
> > >> ./cluster-syntheticcontrol.sh ->2 [works]
> > >> ./cluster-syntheticcontrol.sh ->3 [works]
> > >>
> > >>
> > >> ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
> > >> [...]
> > >> WARNING: Unable to add class:
> > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > >> java.lang.ClassNotFoundException:
> > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > >> at
> > >> java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > >> at java.security.AccessController.doPrivileged(Native Method)
> > >> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > >> at
> > java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > >> at java.lang.Class.forName0(Native Method)
> > >> at java.lang.Class.forName(Class.java:171)
> > >> at
> > >> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > >> at
> > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > >> Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
> > >>
> > >>
> > >> ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
> > >>
> > >> WARNING: Unable to add class:
> > >>
> > org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > >> java.lang.ClassNotFoundException:
> > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > >> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > >> at java.security.AccessController.doPrivileged(Native Method)
> > >> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > >> at java.lang.Class.forName0(Native Method)
> > >> at
> > java.lang.Class.forName(Class.java:171)
> > >> at
> > >> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > >> at
> > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > >> Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
> > >> WARNING: No
> > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found
> > on
> > >> classpath, will use command-line arguments only
> > >> Unknown program
> > >> 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.
> > >>
> > >>
> > >> ./classify-20newsgroups.sh ->1 [works]
> > >> ./classify-20newsgroups.sh ->2 [works]
> > >>
> > >>
> > >> cluster-reuters.sh ->1 [works]
> > >>
> > cluster-reuters.sh ->2 [works]
> > >> cluster-reuters.sh ->3 [works]
> > >>
> > >> Same error as noted previosly in the thread:
> > >>
> > >> cluster-reuters.sh ->4 [0 clusters]
> > >>
> > >> [...]
> > >>
> > >> WARNING: No qualcluster.props found on classpath, will use
> > >> command-line arguments only
> > >> Num clusters: 0; maxDistance: 0.000000
> > >> [Dunn Index]
> > >> First: Infinity
> > >> [Davies-Bouldin Index] First: NaN
> > >> Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
> > >> INFO: Program took 669 ms (Minutes: 0.01115)
> > >> cluster,distance.mean,distance.sd
> > >>
> >
> > ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
> > >> > From: suneel_marthi@yahoo.com
> > >> > Subject: MAHOUT 0.9 Release - New URL
> > >> > To: user@mahout.apache.org; dev@mahout.apache.org
> > >> >
> > >> > Third time's a Charm!!!
> > >> >
> > >> >
> > >> > Here's the new URL for Mahout 0.9 Release:
> > >> >
> > >>
> > https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> > >> >
> > >> > For those volunteering to test this, some of the things to be
> > verified:
> > >> >
> > >> > a) Verify that u can unpack the release (tar or zip)
> > >> > b) Verify u r able to compile the distro
> > >> > c) Run through the unit tests: mvn clean test
> > >> > d) Run the example scripts
> > >> under $MAHOUT_HOME/examples/bin. Please run through all the different
> > >> options in each script.
> > >> >
> > >> >
> > >> > Committers
> > >> > and PMC members:
> > >> > ---------------------------------------
> > >> >
> > >> > Need 'at least 3 +1 votes' for the Release to pass.
> > >> >
> > >> >
> > >> > Thanks and
> > Regards.
> > >>
> > >
> > >
> >
RE: MAHOUT 0.9 Release - New URL
Posted by Andrew Palumbo <ap...@outlook.com>.
from the asf-email-examples.sh script:
# You will need to download or otherwise obtain some or all of the Amazon ASF Em
ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566) to use this
script.
# To obtain a full copy you will need to launch an EC2 instance and mount the da
taset to download it, otherwise you can get a sample of it at
# http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
It looks like the:
http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
link is down.
Is there somewhere else that we can get a subset of the ASF emails?
Date: Tue, 21 Jan 2014 09:48:06 -0800
> Subject: Re: MAHOUT 0.9 Release - New URL
> From: andrew.musselman@gmail.com
> To: dev@mahout.apache.org
>
> Sure thing; continuing to smoke test the other examples tonight
>
>
> On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <su...@yahoo.com>wrote:
>
> > Thanks Andrew M., see that some of the example scripts need to be fixed as
> > they still refer to the deprecated algorithms.
> > See that the Streaming KMeans has failed for you as well.
> >
> > I'll be rolling back the release today to fix these issues.
> >
> >
> >
> >
> >
> > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
> > andrew.musselman@gmail.com> wrote:
> >
> > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default 64-bit
> > Linux AMI from tarball.
> >
> > All tests pass.
> >
> > *Output of examples:*
> > *asf-email-examples.sh, run on mahout.apache.org
> > <http://mahout.apache.org>:*
> > *recommendations:*
> > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
> > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000 | less
> > 1
> >
> > [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
> > 4
> >
> > [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
> > 6
> >
> > [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
> > 8
> > [12758:1.0,19409:1.0,11112:1.0]
> > 11
> >
> > [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
> > 14
> >
> > [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
> > 15
> >
> > [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
> > 16
> >
> > [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
> > 18
> >
> > [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
> > 19 [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
> > 20
> >
> > [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
> > [snip]
> >
> > *clustering; kmeans:*
> > [snip]
> > Weight : [props - optional]: Point:
> > 1.0 :
> > [distance-squared=1.0193102046188427]:
> > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus =
> > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110, 7573:0.204,
> > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093, 9779:0.159,
> > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
> > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
> > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
> > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
> > 39789:0.110, 40743:0.190, 45775:0.086]
> > 1.0 : [distance-squared=0.9823018320457279]:
> > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus =
> > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104, 5336:0.106,
> > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173, 7832:0.072,
> > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
> > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
> > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
> > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
> > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
> > 1.0 : [distance-squared=0.9509142993214911]:
> > /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
> > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
> > 4419:0.076,
> > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056, 7235:0.048,
> > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123, 7683:0.077,
> > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118, 10225:0.081,
> > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
> > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
> > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
> > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
> > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
> > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
> > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
> > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
> > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
> > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
> > 41280:0.065, 41696:0.072, 41947:0.118,
> > 43685:0.086, 44077:0.308,
> > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
> > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
> > [snip]
> >
> > *clustering; dirichlet:*
> > Get this complaint:
> > Running Dirichlet with K = 8
> > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > HADOOP_CONF_DIR=
> > MAHOUT-JOB:
> >
> > /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class: dirichlet
> > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props found on
> > classpath, will use command-line arguments only
> > Unknown program 'dirichlet' chosen.
> >
> > *clustering: minhash:*
> > Running Minhash
> > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > HADOOP_CONF_DIR=
> > MAHOUT-JOB:
> >
> > /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > 14/01/21 05:17:27 WARN
> > driver.MahoutDriver: Unable to add class: minhash
> > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found on
> > classpath, will use command-line arguments only
> > Unknown program 'minhash' chosen.
> >
> > *classification; standard:*
> > =======================================================
> > Summary
> > -------------------------------------------------------
> > Correctly Classified Instances : 5384 87.7874%
> > Incorrectly Classified Instances : 749 12.2126%
> > Total Classified Instances : 6133
> >
> > =======================================================
> > Confusion Matrix
> > -------------------------------------------------------
> > a b c d
> > <--Classified as
> > 2949 7 531 25 | 3512 a = dev
> > 0 0 0 0 | 0 b = general
> > 99 8 1763 8 | 1878 c = user
> > 41 1 29 672 | 743 d = commits
> >
> > =======================================================
> > Statistics
> > -------------------------------------------------------
> > Kappa
> > 0.7877
> > Accuracy 87.7874%
> > Reliability 53.658%
> > Reliability (standard deviation) 0.4911
> >
> > *classification; complementary:*
> > =======================================================
> > Summary
> > -------------------------------------------------------
> > Correctly Classified Instances : 5530 90.1679%
> > Incorrectly Classified Instances : 603 9.8321%
> > Total Classified Instances :
> > 6133
> >
> > =======================================================
> > Confusion Matrix
> > -------------------------------------------------------
> > a b c d <--Classified as
> > 3168 0 276 68 | 3512 a = dev
> > 0 0 0 0 | 0 b = general
> > 196 0 1652 30 | 1878 c = user
> > 25 0 8 710 | 743 d =
> > commits
> >
> > =======================================================
> > Statistics
> > -------------------------------------------------------
> > Kappa 0.8259
> > Accuracy 90.1679%
> > Reliability 54.7459%
> > Reliability (standard deviation) 0.5005
> >
> > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms (Minutes:
> > 0.34836666666666666)
> >
> > *classification; sgd, with three categories:*
> > Running SGD Training
> > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
> > and
> > HADOOP_CONF_DIR=
> > MAHOUT-JOB:
> >
> > /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
> > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on classpath,
> > will use command-line arguments only
> > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
> > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
> > --input=[asf-output/classification/sgd/splits/mapRedOut/],
> > --output=[asf-output/classification/sgd/models], --poolSize=[5],
> > --startPhase=[0], --tempDir=[temp], --threads=[20]}
> > 24168 training files
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 1
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> > 2
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 3
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 4
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 6
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 8
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 10
> > 0.000 0.00 none
> > 0.00 0.00
> > 0.00 0.00 0.0000000 0.0000000 12
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 15
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 20
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 25
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 30
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000
> > 0.0000000 40
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 50
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 60
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 70
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 80
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 100
> > 0.000
> > 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 120
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 140
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 150
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 200
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 250
> > 0.000 0.00 none
> > 0.00 0.00
> > 0.00 0.00 0.0000000 0.0000000 300
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 400
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 500
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 600
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 700
> > 0.000 0.00 none
> > 0.00 0.00 0.00 0.00 0.0000000
> > 0.0000000 800
> > 0.000 0.00 none
> > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > 1.0019413e-08 1000 -0.607 75.78 none
> > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > 1.0019413e-08 1200 -0.607 75.78 none
> > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > 1.0019413e-08 1400 -0.607 75.78 none
> > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > 1.0019413e-08 1500 -0.607 75.78 none
> > 0.24 43686.00 17924.00 329.50
> > 1.0571799e-08
> > 1.0032261e-08 2000 -0.487 82.65 none
> > 0.24 49753.00 21610.00 330.71 1.3770070e-08
> > 1.0011902e-08 2500 -0.439 83.90 none
> > 0.24 49753.00 21610.00 330.71 1.3770070e-08
> > 1.0011902e-08 3000 -0.439 83.90 none
> > 0.32 50635.00 28531.00 437.09 1.0551175e-08
> > 1.0000001e-08 4000 -0.351 88.14 none
> > 0.32 50635.00 32642.00 437.09 1.0551175e-08
> > 1.0000000e-08 5000 -0.378 87.10 none
> > 0.32 50635.00 36461.00 437.09
> > 1.0556652e-08
> > 1.0000001e-08 6000 -0.372 86.89 none
> > 0.32 50635.00 37768.00 437.09 1.0576742e-08
> > 1.0000001e-08 7000 -0.334 89.26 none
> > 0.32 50635.00 38807.00 437.09 1.0576742e-08
> > 1.0000000e-08 8000 -0.368 87.52 none
> > 0.32 50635.00 44731.00 437.09 1.0576716e-08
> > 1.0000000e-08 10000 -0.374 87.39 none
> > 0.32 50635.00 45672.00 437.09 1.0576716e-08
> > 1.0000000e-08 12000 -0.298 88.26 none
> > Exception in thread "main" java.lang.IllegalStateException:
> > java.lang.ArrayIndexOutOfBoundsException:
> > 2
> > at
> >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
> > at
> >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
> > at
> >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
> > at
> > org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
> > at
> > org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >
> > at
> >
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:622)
> > at
> >
> > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > at
> > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > at
> > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > at
> >
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:622)
> > at
> > org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
> > at
> > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
> > at
> >
> > org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
> > at
> >
> > org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
> > at
> >
> > org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
> > at
> >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
> > at
> >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
> >
> > at
> >
> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
> > at
> >
> > org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
> > at
> >
> > org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
> > at
> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > at
> >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > at
> >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:701)
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
> > andrew.musselman@gmail.com> wrote:
> >
> > > Trying out the build today
> > >
> > >
> > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <suneel_marthi@yahoo.com
> > >wrote:
> > >
> > >> This is an issue (trivial one though) that needs to be fixed for 0.9
> > >> Release, will be rerolling the release today (in the next few hrs) and
> > >> putting out a new release candidate in staging.
> > >>
> > >> Thanks for reporting this Andrew P.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
> > ap.dev@outlook.com>
> > >> wrote:
> > >>
> > >> I ran through the tests with on a CentOS VM
> > AMD64 2 cores 4 GB RAM. Had
> > >> a bit of trouble getting the Hadoop natives to compile and therefore may
> > >> have run into some problems because of the hadoop setup. Ran into some
> > >> problems in the example scripts. Particularly with
> > >> ./cluster-syntheticcontrol.sh ->4,5. I will run through the rest of the
> > >> examples when im sure I've got hadoop setup right.
> > >>
> > >>
> > >> Apache Maven 3.1.2-SNAPSHOT
> > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> > >> Java home: /usr/java/jdk1.6.0_45/jre
> > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64",
> > >> family: "unix"
> > >> $MAHOUT_LOCAL=true
> > >> Hadoop 2.2.0
> > >>
> > >>
> > >> a) Verify that u can unpack the release (tar or zip) ...passed (tar)
> > >> [passed ]
> > >>
> > >> b) Verify u r able to compile the
> > distro
> > >>
> > >> mvn compile- [passed with warnings]
> > >>
> > >> [WARNING] Expected all dependencies to require Scala version: 2.9.3
> > >> [WARNING] org.apache.mahout:mahout-math-scala:0.9 requires scala
> > >> version: 2.9.3
> > >> [WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala
> > >> version: 2.9.2
> > >> [WARNING] Multiple versions of scala libraries detected!
> > >>
> > >> c) Run through the unit tests: mvn clean test
> > >> mvn clean test [passed]
> > >>
> > >> d) Run the
> > >> example scripts under $MAHOUT_HOME/examples/bin.
> > >> Please run through all the different options in each script
> > >>
> > >> Running example scripts with $MAHOUT_LOCAL=true
> > >>
> > >>
> > ./cluster-syntheticcontrol.sh ->1 [works]
> > >> ./cluster-syntheticcontrol.sh ->2 [works]
> > >> ./cluster-syntheticcontrol.sh ->3 [works]
> > >>
> > >>
> > >> ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
> > >> [...]
> > >> WARNING: Unable to add class:
> > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > >> java.lang.ClassNotFoundException:
> > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > >> at
> > >> java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > >> at java.security.AccessController.doPrivileged(Native Method)
> > >> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > >> at
> > java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > >> at java.lang.Class.forName0(Native Method)
> > >> at java.lang.Class.forName(Class.java:171)
> > >> at
> > >> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > >> at
> > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > >> Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
> > >>
> > >>
> > >> ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
> > >>
> > >> WARNING: Unable to add class:
> > >>
> > org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > >> java.lang.ClassNotFoundException:
> > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > >> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > >> at java.security.AccessController.doPrivileged(Native Method)
> > >> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > >> at java.lang.Class.forName0(Native Method)
> > >> at
> > java.lang.Class.forName(Class.java:171)
> > >> at
> > >> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > >> at
> > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > >> Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
> > >> WARNING: No
> > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found
> > on
> > >> classpath, will use command-line arguments only
> > >> Unknown program
> > >> 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.
> > >>
> > >>
> > >> ./classify-20newsgroups.sh ->1 [works]
> > >> ./classify-20newsgroups.sh ->2 [works]
> > >>
> > >>
> > >> cluster-reuters.sh ->1 [works]
> > >>
> > cluster-reuters.sh ->2 [works]
> > >> cluster-reuters.sh ->3 [works]
> > >>
> > >> Same error as noted previosly in the thread:
> > >>
> > >> cluster-reuters.sh ->4 [0 clusters]
> > >>
> > >> [...]
> > >>
> > >> WARNING: No qualcluster.props found on classpath, will use
> > >> command-line arguments only
> > >> Num clusters: 0; maxDistance: 0.000000
> > >> [Dunn Index]
> > >> First: Infinity
> > >> [Davies-Bouldin Index] First: NaN
> > >> Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
> > >> INFO: Program took 669 ms (Minutes: 0.01115)
> > >> cluster,distance.mean,distance.sd
> > >>
> >
> > ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
> > >> > From: suneel_marthi@yahoo.com
> > >> > Subject: MAHOUT 0.9 Release - New URL
> > >> > To: user@mahout.apache.org; dev@mahout.apache.org
> > >> >
> > >> > Third time's a Charm!!!
> > >> >
> > >> >
> > >> > Here's the new URL for Mahout 0.9 Release:
> > >> >
> > >>
> > https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> > >> >
> > >> > For those volunteering to test this, some of the things to be
> > verified:
> > >> >
> > >> > a) Verify that u can unpack the release (tar or zip)
> > >> > b) Verify u r able to compile the distro
> > >> > c) Run through the unit tests: mvn clean test
> > >> > d) Run the example scripts
> > >> under $MAHOUT_HOME/examples/bin. Please run through all the different
> > >> options in each script.
> > >> >
> > >> >
> > >> > Committers
> > >> > and PMC members:
> > >> > ---------------------------------------
> > >> >
> > >> > Need 'at least 3 +1 votes' for the Release to pass.
> > >> >
> > >> >
> > >> > Thanks and
> > Regards.
> > >>
> > >
> > >
> >
Re: MAHOUT 0.9 Release - New URL
Posted by Andrew Musselman <an...@gmail.com>.
Sure thing; continuing to smoke test the other examples tonight
On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <su...@yahoo.com>wrote:
> Thanks Andrew M., see that some of the example scripts need to be fixed as
> they still refer to the deprecated algorithms.
> See that the Streaming KMeans has failed for you as well.
>
> I'll be rolling back the release today to fix these issues.
>
>
>
>
>
> On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
> Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default 64-bit
> Linux AMI from tarball.
>
> All tests pass.
>
> *Output of examples:*
> *asf-email-examples.sh, run on mahout.apache.org
> <http://mahout.apache.org>:*
> *recommendations:*
> [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
> /user/ec2-user/asf-output/prefs/recommendations/part-r-00000 | less
> 1
>
> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
> 4
>
> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
> 6
>
> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
> 8
> [12758:1.0,19409:1.0,11112:1.0]
> 11
>
> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
> 14
>
> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
> 15
>
> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
> 16
>
> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
> 18
>
> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
> 19 [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
> 20
>
> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
> [snip]
>
> *clustering; kmeans:*
> [snip]
> Weight : [props - optional]: Point:
> 1.0 :
> [distance-squared=1.0193102046188427]:
> /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus =
> [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110, 7573:0.204,
> 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093, 9779:0.159,
> 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
> 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
> 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
> 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
> 39789:0.110, 40743:0.190, 45775:0.086]
> 1.0 : [distance-squared=0.9823018320457279]:
> /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus =
> [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104, 5336:0.106,
> 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173, 7832:0.072,
> 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
> 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
> 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
> 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
> 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
> 1.0 : [distance-squared=0.9509142993214911]:
> /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
> [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
> 4419:0.076,
> 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056, 7235:0.048,
> 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123, 7683:0.077,
> 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118, 10225:0.081,
> 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
> 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
> 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
> 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
> 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
> 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
> 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
> 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
> 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
> 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
> 41280:0.065, 41696:0.072, 41947:0.118,
> 43685:0.086, 44077:0.308,
> 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
> 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
> [snip]
>
> *clustering; dirichlet:*
> Get this complaint:
> Running Dirichlet with K = 8
> Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
>
> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class: dirichlet
> 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props found on
> classpath, will use command-line arguments only
> Unknown program 'dirichlet' chosen.
>
> *clustering: minhash:*
> Running Minhash
> Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
>
> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/01/21 05:17:27 WARN
> driver.MahoutDriver: Unable to add class: minhash
> 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found on
> classpath, will use command-line arguments only
> Unknown program 'minhash' chosen.
>
> *classification; standard:*
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances : 5384 87.7874%
> Incorrectly Classified Instances : 749 12.2126%
> Total Classified Instances : 6133
>
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a b c d
> <--Classified as
> 2949 7 531 25 | 3512 a = dev
> 0 0 0 0 | 0 b = general
> 99 8 1763 8 | 1878 c = user
> 41 1 29 672 | 743 d = commits
>
> =======================================================
> Statistics
> -------------------------------------------------------
> Kappa
> 0.7877
> Accuracy 87.7874%
> Reliability 53.658%
> Reliability (standard deviation) 0.4911
>
> *classification; complementary:*
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances : 5530 90.1679%
> Incorrectly Classified Instances : 603 9.8321%
> Total Classified Instances :
> 6133
>
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a b c d <--Classified as
> 3168 0 276 68 | 3512 a = dev
> 0 0 0 0 | 0 b = general
> 196 0 1652 30 | 1878 c = user
> 25 0 8 710 | 743 d =
> commits
>
> =======================================================
> Statistics
> -------------------------------------------------------
> Kappa 0.8259
> Accuracy 90.1679%
> Reliability 54.7459%
> Reliability (standard deviation) 0.5005
>
> 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms (Minutes:
> 0.34836666666666666)
>
> *classification; sgd, with three categories:*
> Running SGD Training
> Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
> and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
>
> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/01/21 05:58:00 WARN driver.MahoutDriver: No
> org.apache.mahout.classifier.sgd.TrainASFEmail.props found on classpath,
> will use command-line arguments only
> 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
> {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
> --input=[asf-output/classification/sgd/splits/mapRedOut/],
> --output=[asf-output/classification/sgd/models], --poolSize=[5],
> --startPhase=[0], --tempDir=[temp], --threads=[20]}
> 24168 training files
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000 1
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> 2
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000 3
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000 4
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000 6
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000 8
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000 10
> 0.000 0.00 none
> 0.00 0.00
> 0.00 0.00 0.0000000 0.0000000 12
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000 15
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000 20
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000 25
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000 30
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000
> 0.0000000 40
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000 50
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000 60
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000 70
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000 80
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000 100
> 0.000
> 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000 120
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000 140
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000 150
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000 200
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000 250
> 0.000 0.00 none
> 0.00 0.00
> 0.00 0.00 0.0000000 0.0000000 300
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000 400
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000 500
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000 600
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000 0.0000000 700
> 0.000 0.00 none
> 0.00 0.00 0.00 0.00 0.0000000
> 0.0000000 800
> 0.000 0.00 none
> 0.13 32659.00 12672.00 82.50 1.3512194e-08
> 1.0019413e-08 1000 -0.607 75.78 none
> 0.13 32659.00 12672.00 82.50 1.3512194e-08
> 1.0019413e-08 1200 -0.607 75.78 none
> 0.13 32659.00 12672.00 82.50 1.3512194e-08
> 1.0019413e-08 1400 -0.607 75.78 none
> 0.13 32659.00 12672.00 82.50 1.3512194e-08
> 1.0019413e-08 1500 -0.607 75.78 none
> 0.24 43686.00 17924.00 329.50
> 1.0571799e-08
> 1.0032261e-08 2000 -0.487 82.65 none
> 0.24 49753.00 21610.00 330.71 1.3770070e-08
> 1.0011902e-08 2500 -0.439 83.90 none
> 0.24 49753.00 21610.00 330.71 1.3770070e-08
> 1.0011902e-08 3000 -0.439 83.90 none
> 0.32 50635.00 28531.00 437.09 1.0551175e-08
> 1.0000001e-08 4000 -0.351 88.14 none
> 0.32 50635.00 32642.00 437.09 1.0551175e-08
> 1.0000000e-08 5000 -0.378 87.10 none
> 0.32 50635.00 36461.00 437.09
> 1.0556652e-08
> 1.0000001e-08 6000 -0.372 86.89 none
> 0.32 50635.00 37768.00 437.09 1.0576742e-08
> 1.0000001e-08 7000 -0.334 89.26 none
> 0.32 50635.00 38807.00 437.09 1.0576742e-08
> 1.0000000e-08 8000 -0.368 87.52 none
> 0.32 50635.00 44731.00 437.09 1.0576716e-08
> 1.0000000e-08 10000 -0.374 87.39 none
> 0.32 50635.00 45672.00 437.09 1.0576716e-08
> 1.0000000e-08 12000 -0.298 88.26 none
> Exception in thread "main" java.lang.IllegalStateException:
> java.lang.ArrayIndexOutOfBoundsException:
> 2
> at
>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
> at
>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
> at
>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
> at
> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
> at
> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:622)
> at
>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:622)
> at
> org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
> at
> org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
> at
>
> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
> at
>
> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
> at
>
> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
> at
>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
> at
>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
>
> at
>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
> at
>
> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
> at
>
> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:701)
>
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
> > Trying out the build today
> >
> >
> > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <suneel_marthi@yahoo.com
> >wrote:
> >
> >> This is an issue (trivial one though) that needs to be fixed for 0.9
> >> Release, will be rerolling the release today (in the next few hrs) and
> >> putting out a new release candidate in staging.
> >>
> >> Thanks for reporting this Andrew P.
> >>
> >>
> >>
> >>
> >>
> >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
> ap.dev@outlook.com>
> >> wrote:
> >>
> >> I ran through the tests with on a CentOS VM
> AMD64 2 cores 4 GB RAM. Had
> >> a bit of trouble getting the Hadoop natives to compile and therefore may
> >> have run into some problems because of the hadoop setup. Ran into some
> >> problems in the example scripts. Particularly with
> >> ./cluster-syntheticcontrol.sh ->4,5. I will run through the rest of the
> >> examples when im sure I've got hadoop setup right.
> >>
> >>
> >> Apache Maven 3.1.2-SNAPSHOT
> >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> >> Java home: /usr/java/jdk1.6.0_45/jre
> >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64",
> >> family: "unix"
> >> $MAHOUT_LOCAL=true
> >> Hadoop 2.2.0
> >>
> >>
> >> a) Verify that u can unpack the release (tar or zip) ...passed (tar)
> >> [passed ]
> >>
> >> b) Verify u r able to compile the
> distro
> >>
> >> mvn compile- [passed with warnings]
> >>
> >> [WARNING] Expected all dependencies to require Scala version: 2.9.3
> >> [WARNING] org.apache.mahout:mahout-math-scala:0.9 requires scala
> >> version: 2.9.3
> >> [WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala
> >> version: 2.9.2
> >> [WARNING] Multiple versions of scala libraries detected!
> >>
> >> c) Run through the unit tests: mvn clean test
> >> mvn clean test [passed]
> >>
> >> d) Run the
> >> example scripts under $MAHOUT_HOME/examples/bin.
> >> Please run through all the different options in each script
> >>
> >> Running example scripts with $MAHOUT_LOCAL=true
> >>
> >>
> ./cluster-syntheticcontrol.sh ->1 [works]
> >> ./cluster-syntheticcontrol.sh ->2 [works]
> >> ./cluster-syntheticcontrol.sh ->3 [works]
> >>
> >>
> >> ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
> >> [...]
> >> WARNING: Unable to add class:
> >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> >> java.lang.ClassNotFoundException:
> >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> >> at
> >> java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> >> at java.security.AccessController.doPrivileged(Native Method)
> >> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> >> at
> java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> >> at java.lang.Class.forName0(Native Method)
> >> at java.lang.Class.forName(Class.java:171)
> >> at
> >> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> >> at
> >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> >> Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
> >>
> >>
> >> ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
> >>
> >> WARNING: Unable to add class:
> >>
> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> >> java.lang.ClassNotFoundException:
> >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> >> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> >> at java.security.AccessController.doPrivileged(Native Method)
> >> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> >> at java.lang.Class.forName0(Native Method)
> >> at
> java.lang.Class.forName(Class.java:171)
> >> at
> >> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> >> at
> >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> >> Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
> >> WARNING: No
> >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found
> on
> >> classpath, will use command-line arguments only
> >> Unknown program
> >> 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.
> >>
> >>
> >> ./classify-20newsgroups.sh ->1 [works]
> >> ./classify-20newsgroups.sh ->2 [works]
> >>
> >>
> >> cluster-reuters.sh ->1 [works]
> >>
> cluster-reuters.sh ->2 [works]
> >> cluster-reuters.sh ->3 [works]
> >>
> >> Same error as noted previosly in the thread:
> >>
> >> cluster-reuters.sh ->4 [0 clusters]
> >>
> >> [...]
> >>
> >> WARNING: No qualcluster.props found on classpath, will use
> >> command-line arguments only
> >> Num clusters: 0; maxDistance: 0.000000
> >> [Dunn Index]
> >> First: Infinity
> >> [Davies-Bouldin Index] First: NaN
> >> Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
> >> INFO: Program took 669 ms (Minutes: 0.01115)
> >> cluster,distance.mean,distance.sd
> >>
>
> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> >>
> >>
> >>
> >>
> >>
> >>
> >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
> >> > From: suneel_marthi@yahoo.com
> >> > Subject: MAHOUT 0.9 Release - New URL
> >> > To: user@mahout.apache.org; dev@mahout.apache.org
> >> >
> >> > Third time's a Charm!!!
> >> >
> >> >
> >> > Here's the new URL for Mahout 0.9 Release:
> >> >
> >>
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> >> >
> >> > For those volunteering to test this, some of the things to be
> verified:
> >> >
> >> > a) Verify that u can unpack the release (tar or zip)
> >> > b) Verify u r able to compile the distro
> >> > c) Run through the unit tests: mvn clean test
> >> > d) Run the example scripts
> >> under $MAHOUT_HOME/examples/bin. Please run through all the different
> >> options in each script.
> >> >
> >> >
> >> > Committers
> >> > and PMC members:
> >> > ---------------------------------------
> >> >
> >> > Need 'at least 3 +1 votes' for the Release to pass.
> >> >
> >> >
> >> > Thanks and
> Regards.
> >>
> >
> >
>
Re: MAHOUT 0.9 Release - New URL
Posted by Suneel Marthi <su...@yahoo.com>.
Thanks Andrew M., see that some of the example scripts need to be fixed as they still refer to the deprecated algorithms.
See that the Streaming KMeans has failed for you as well.
I'll be rolling back the release today to fix these issues.
On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <an...@gmail.com> wrote:
Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default 64-bit
Linux AMI from tarball.
All tests pass.
*Output of examples:*
*asf-email-examples.sh, run on mahout.apache.org
<http://mahout.apache.org>:*
*recommendations:*
[ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
/user/ec2-user/asf-output/prefs/recommendations/part-r-00000 | less
1
[21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
4
[14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
6
[5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
8
[12758:1.0,19409:1.0,11112:1.0]
11
[25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
14
[29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
15
[15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
16
[23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
18
[29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
19 [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
20
[19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
[snip]
*clustering; kmeans:*
[snip]
Weight : [props - optional]: Point:
1.0 :
[distance-squared=1.0193102046188427]:
/commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus =
[1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110, 7573:0.204,
7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093, 9779:0.159,
10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
39789:0.110, 40743:0.190, 45775:0.086]
1.0 : [distance-squared=0.9823018320457279]:
/commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus =
[1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104, 5336:0.106,
6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173, 7832:0.072,
10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
1.0 : [distance-squared=0.9509142993214911]:
/commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
[648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
4419:0.076,
4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056, 7235:0.048,
7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123, 7683:0.077,
7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118, 10225:0.081,
10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
41280:0.065, 41696:0.072, 41947:0.118,
43685:0.086, 44077:0.308,
44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
[snip]
*clustering; dirichlet:*
Get this complaint:
Running Dirichlet with K = 8
Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class: dirichlet
14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props found on
classpath, will use command-line arguments only
Unknown program 'dirichlet' chosen.
*clustering: minhash:*
Running Minhash
Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/01/21 05:17:27 WARN
driver.MahoutDriver: Unable to add class: minhash
14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found on
classpath, will use command-line arguments only
Unknown program 'minhash' chosen.
*classification; standard:*
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 5384 87.7874%
Incorrectly Classified Instances : 749 12.2126%
Total Classified Instances : 6133
=======================================================
Confusion Matrix
-------------------------------------------------------
a b c d
<--Classified as
2949 7 531 25 | 3512 a = dev
0 0 0 0 | 0 b = general
99 8 1763 8 | 1878 c = user
41 1 29 672 | 743 d = commits
=======================================================
Statistics
-------------------------------------------------------
Kappa
0.7877
Accuracy 87.7874%
Reliability 53.658%
Reliability (standard deviation) 0.4911
*classification; complementary:*
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 5530 90.1679%
Incorrectly Classified Instances : 603 9.8321%
Total Classified Instances :
6133
=======================================================
Confusion Matrix
-------------------------------------------------------
a b c d <--Classified as
3168 0 276 68 | 3512 a = dev
0 0 0 0 | 0 b = general
196 0 1652 30 | 1878 c = user
25 0 8 710 | 743 d =
commits
=======================================================
Statistics
-------------------------------------------------------
Kappa 0.8259
Accuracy 90.1679%
Reliability 54.7459%
Reliability (standard deviation) 0.5005
14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms (Minutes:
0.34836666666666666)
*classification; sgd, with three categories:*
Running SGD Training
Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
and
HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/01/21 05:58:00 WARN driver.MahoutDriver: No
org.apache.mahout.classifier.sgd.TrainASFEmail.props found on classpath,
will use command-line arguments only
14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
{--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
--input=[asf-output/classification/sgd/splits/mapRedOut/],
--output=[asf-output/classification/sgd/models], --poolSize=[5],
--startPhase=[0], --tempDir=[temp], --threads=[20]}
24168 training files
0.00 0.00 0.00 0.00 0.0000000 0.0000000 1
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000
2
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 3
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 4
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 6
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 8
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 10
0.000 0.00 none
0.00 0.00
0.00 0.00 0.0000000 0.0000000 12
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 15
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 20
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 25
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 30
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000
0.0000000 40
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 50
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 60
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 70
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 80
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 100
0.000
0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 120
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 140
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 150
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 200
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 250
0.000 0.00 none
0.00 0.00
0.00 0.00 0.0000000 0.0000000 300
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 400
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 500
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 600
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 700
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000
0.0000000 800
0.000 0.00 none
0.13 32659.00 12672.00 82.50 1.3512194e-08
1.0019413e-08 1000 -0.607 75.78 none
0.13 32659.00 12672.00 82.50 1.3512194e-08
1.0019413e-08 1200 -0.607 75.78 none
0.13 32659.00 12672.00 82.50 1.3512194e-08
1.0019413e-08 1400 -0.607 75.78 none
0.13 32659.00 12672.00 82.50 1.3512194e-08
1.0019413e-08 1500 -0.607 75.78 none
0.24 43686.00 17924.00 329.50
1.0571799e-08
1.0032261e-08 2000 -0.487 82.65 none
0.24 49753.00 21610.00 330.71 1.3770070e-08
1.0011902e-08 2500 -0.439 83.90 none
0.24 49753.00 21610.00 330.71 1.3770070e-08
1.0011902e-08 3000 -0.439 83.90 none
0.32 50635.00 28531.00 437.09 1.0551175e-08
1.0000001e-08 4000 -0.351 88.14 none
0.32 50635.00 32642.00 437.09 1.0551175e-08
1.0000000e-08 5000 -0.378 87.10 none
0.32 50635.00 36461.00 437.09
1.0556652e-08
1.0000001e-08 6000 -0.372 86.89 none
0.32 50635.00 37768.00 437.09 1.0576742e-08
1.0000001e-08 7000 -0.334 89.26 none
0.32 50635.00 38807.00 437.09 1.0576742e-08
1.0000000e-08 8000 -0.368 87.52 none
0.32 50635.00 44731.00 437.09 1.0576716e-08
1.0000000e-08 10000 -0.374 87.39 none
0.32 50635.00 45672.00 437.09 1.0576716e-08
1.0000000e-08 12000 -0.298 88.26 none
Exception in thread "main" java.lang.IllegalStateException:
java.lang.ArrayIndexOutOfBoundsException:
2
at
org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
at
org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
at
org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
at
org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
at
org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at
org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
at org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
at
org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
at
org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
at
org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
at
org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
at
org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
at
org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
at
org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
at
org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)
On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:
> Trying out the build today
>
>
> On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <su...@yahoo.com>wrote:
>
>> This is an issue (trivial one though) that needs to be fixed for 0.9
>> Release, will be rerolling the release today (in the next few hrs) and
>> putting out a new release candidate in staging.
>>
>> Thanks for reporting this Andrew P.
>>
>>
>>
>>
>>
>> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <ap...@outlook.com>
>> wrote:
>>
>> I ran through the tests with on a CentOS VM
AMD64 2 cores 4 GB RAM. Had
>> a bit of trouble getting the Hadoop natives to compile and therefore may
>> have run into some problems because of the hadoop setup. Ran into some
>> problems in the example scripts. Particularly with
>> ./cluster-syntheticcontrol.sh ->4,5. I will run through the rest of the
>> examples when im sure I've got hadoop setup right.
>>
>>
>> Apache Maven 3.1.2-SNAPSHOT
>> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
>> Java home: /usr/java/jdk1.6.0_45/jre
>> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64",
>> family: "unix"
>> $MAHOUT_LOCAL=true
>> Hadoop 2.2.0
>>
>>
>> a) Verify that u can unpack the release (tar or zip) ...passed (tar)
>> [passed ]
>>
>> b) Verify u r able to compile the
distro
>>
>> mvn compile- [passed with warnings]
>>
>> [WARNING] Expected all dependencies to require Scala version: 2.9.3
>> [WARNING] org.apache.mahout:mahout-math-scala:0.9 requires scala
>> version: 2.9.3
>> [WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala
>> version: 2.9.2
>> [WARNING] Multiple versions of scala libraries detected!
>>
>> c) Run through the unit tests: mvn clean test
>> mvn clean test [passed]
>>
>> d) Run the
>> example scripts under $MAHOUT_HOME/examples/bin.
>> Please run through all the different options in each script
>>
>> Running example scripts with $MAHOUT_LOCAL=true
>>
>>
./cluster-syntheticcontrol.sh ->1 [works]
>> ./cluster-syntheticcontrol.sh ->2 [works]
>> ./cluster-syntheticcontrol.sh ->3 [works]
>>
>>
>> ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
>> [...]
>> WARNING: Unable to add class:
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>> java.lang.ClassNotFoundException:
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>> at
>> java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>> at
java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:171)
>> at
>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>> at
>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>> Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
>>
>>
>> ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
>>
>> WARNING: Unable to add class:
>>
org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>> java.lang.ClassNotFoundException:
>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>> at java.lang.Class.forName0(Native Method)
>> at
java.lang.Class.forName(Class.java:171)
>> at
>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>> at
>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>> Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
>> WARNING: No
>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found on
>> classpath, will use command-line arguments only
>> Unknown program
>> 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.
>>
>>
>> ./classify-20newsgroups.sh ->1 [works]
>> ./classify-20newsgroups.sh ->2 [works]
>>
>>
>> cluster-reuters.sh ->1 [works]
>>
cluster-reuters.sh ->2 [works]
>> cluster-reuters.sh ->3 [works]
>>
>> Same error as noted previosly in the thread:
>>
>> cluster-reuters.sh ->4 [0 clusters]
>>
>> [...]
>>
>> WARNING: No qualcluster.props found on classpath, will use
>> command-line arguments only
>> Num clusters: 0; maxDistance: 0.000000
>> [Dunn Index]
>> First: Infinity
>> [Davies-Bouldin Index] First: NaN
>> Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Program took 669 ms (Minutes: 0.01115)
>> cluster,distance.mean,distance.sd
>>
,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>>
>>
>>
>>
>>
>>
>> > Date: Thu, 16 Jan 2014 06:41:09 -0800
>> > From: suneel_marthi@yahoo.com
>> > Subject: MAHOUT 0.9 Release - New URL
>> > To: user@mahout.apache.org; dev@mahout.apache.org
>> >
>> > Third time's a Charm!!!
>> >
>> >
>> > Here's the new URL for Mahout 0.9 Release:
>> >
>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>> >
>> > For those volunteering to test this, some of the things to be verified:
>> >
>> > a) Verify that u can unpack the release (tar or zip)
>> > b) Verify u r able to compile the distro
>> > c) Run through the unit tests: mvn clean test
>> > d) Run the example scripts
>> under $MAHOUT_HOME/examples/bin. Please run through all the different
>> options in each script.
>> >
>> >
>> > Committers
>> > and PMC members:
>> > ---------------------------------------
>> >
>> > Need 'at least 3 +1 votes' for the Release to pass.
>> >
>> >
>> > Thanks and
Regards.
>>
>
>
Re: MAHOUT 0.9 Release - New URL
Posted by Andrew Musselman <an...@gmail.com>.
Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default 64-bit
Linux AMI from tarball.
All tests pass.
*Output of examples:*
*asf-email-examples.sh, run on mahout.apache.org
<http://mahout.apache.org>:*
*recommendations:*
[ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
/user/ec2-user/asf-output/prefs/recommendations/part-r-00000 | less
1
[21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
4
[14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
6
[5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
8 [12758:1.0,19409:1.0,11112:1.0]
11
[25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
14
[29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
15
[15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
16
[23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
18
[29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
19 [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
20
[19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
[snip]
*clustering; kmeans:*
[snip]
Weight : [props - optional]: Point:
1.0 : [distance-squared=1.0193102046188427]:
/commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus =
[1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110, 7573:0.204,
7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093, 9779:0.159,
10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
39789:0.110, 40743:0.190, 45775:0.086]
1.0 : [distance-squared=0.9823018320457279]:
/commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus =
[1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104, 5336:0.106,
6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173, 7832:0.072,
10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
1.0 : [distance-squared=0.9509142993214911]:
/commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =
[648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048, 4419:0.076,
4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056, 7235:0.048,
7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123, 7683:0.077,
7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118, 10225:0.081,
10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
41280:0.065, 41696:0.072, 41947:0.118, 43685:0.086, 44077:0.308,
44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
[snip]
*clustering; dirichlet:*
Get this complaint:
Running Dirichlet with K = 8
Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class: dirichlet
14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props found on
classpath, will use command-line arguments only
Unknown program 'dirichlet' chosen.
*clustering: minhash:*
Running Minhash
Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/01/21 05:17:27 WARN driver.MahoutDriver: Unable to add class: minhash
14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found on
classpath, will use command-line arguments only
Unknown program 'minhash' chosen.
*classification; standard:*
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 5384 87.7874%
Incorrectly Classified Instances : 749 12.2126%
Total Classified Instances : 6133
=======================================================
Confusion Matrix
-------------------------------------------------------
a b c d <--Classified as
2949 7 531 25 | 3512 a = dev
0 0 0 0 | 0 b = general
99 8 1763 8 | 1878 c = user
41 1 29 672 | 743 d = commits
=======================================================
Statistics
-------------------------------------------------------
Kappa 0.7877
Accuracy 87.7874%
Reliability 53.658%
Reliability (standard deviation) 0.4911
*classification; complementary:*
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 5530 90.1679%
Incorrectly Classified Instances : 603 9.8321%
Total Classified Instances : 6133
=======================================================
Confusion Matrix
-------------------------------------------------------
a b c d <--Classified as
3168 0 276 68 | 3512 a = dev
0 0 0 0 | 0 b = general
196 0 1652 30 | 1878 c = user
25 0 8 710 | 743 d = commits
=======================================================
Statistics
-------------------------------------------------------
Kappa 0.8259
Accuracy 90.1679%
Reliability 54.7459%
Reliability (standard deviation) 0.5005
14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms (Minutes:
0.34836666666666666)
*classification; sgd, with three categories:*
Running SGD Training
Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/01/21 05:58:00 WARN driver.MahoutDriver: No
org.apache.mahout.classifier.sgd.TrainASFEmail.props found on classpath,
will use command-line arguments only
14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
{--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
--input=[asf-output/classification/sgd/splits/mapRedOut/],
--output=[asf-output/classification/sgd/models], --poolSize=[5],
--startPhase=[0], --tempDir=[temp], --threads=[20]}
24168 training files
0.00 0.00 0.00 0.00 0.0000000 0.0000000 1
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 2
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 3
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 4
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 6
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 8
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 10
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 12
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 15
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 20
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 25
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 30
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 40
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 50
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 60
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 70
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 80
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 100
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 120
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 140
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 150
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 200
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 250
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 300
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 400
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 500
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 600
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 700
0.000 0.00 none
0.00 0.00 0.00 0.00 0.0000000 0.0000000 800
0.000 0.00 none
0.13 32659.00 12672.00 82.50 1.3512194e-08
1.0019413e-08 1000 -0.607 75.78 none
0.13 32659.00 12672.00 82.50 1.3512194e-08
1.0019413e-08 1200 -0.607 75.78 none
0.13 32659.00 12672.00 82.50 1.3512194e-08
1.0019413e-08 1400 -0.607 75.78 none
0.13 32659.00 12672.00 82.50 1.3512194e-08
1.0019413e-08 1500 -0.607 75.78 none
0.24 43686.00 17924.00 329.50 1.0571799e-08
1.0032261e-08 2000 -0.487 82.65 none
0.24 49753.00 21610.00 330.71 1.3770070e-08
1.0011902e-08 2500 -0.439 83.90 none
0.24 49753.00 21610.00 330.71 1.3770070e-08
1.0011902e-08 3000 -0.439 83.90 none
0.32 50635.00 28531.00 437.09 1.0551175e-08
1.0000001e-08 4000 -0.351 88.14 none
0.32 50635.00 32642.00 437.09 1.0551175e-08
1.0000000e-08 5000 -0.378 87.10 none
0.32 50635.00 36461.00 437.09 1.0556652e-08
1.0000001e-08 6000 -0.372 86.89 none
0.32 50635.00 37768.00 437.09 1.0576742e-08
1.0000001e-08 7000 -0.334 89.26 none
0.32 50635.00 38807.00 437.09 1.0576742e-08
1.0000000e-08 8000 -0.368 87.52 none
0.32 50635.00 44731.00 437.09 1.0576716e-08
1.0000000e-08 10000 -0.374 87.39 none
0.32 50635.00 45672.00 437.09 1.0576716e-08
1.0000000e-08 12000 -0.298 88.26 none
Exception in thread "main" java.lang.IllegalStateException:
java.lang.ArrayIndexOutOfBoundsException: 2
at
org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
at
org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
at
org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
at
org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
at
org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
at org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
at
org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
at
org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
at
org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
at
org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
at
org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
at
org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
at
org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
at
org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)
On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:
> Trying out the build today
>
>
> On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <su...@yahoo.com>wrote:
>
>> This is an issue (trivial one though) that needs to be fixed for 0.9
>> Release, will be rerolling the release today (in the next few hrs) and
>> putting out a new release candidate in staging.
>>
>> Thanks for reporting this Andrew P.
>>
>>
>>
>>
>>
>> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <ap...@outlook.com>
>> wrote:
>>
>> I ran through the tests with on a CentOS VM AMD64 2 cores 4 GB RAM. Had
>> a bit of trouble getting the Hadoop natives to compile and therefore may
>> have run into some problems because of the hadoop setup. Ran into some
>> problems in the example scripts. Particularly with
>> ./cluster-syntheticcontrol.sh ->4,5. I will run through the rest of the
>> examples when im sure I've got hadoop setup right.
>>
>>
>> Apache Maven 3.1.2-SNAPSHOT
>> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
>> Java home: /usr/java/jdk1.6.0_45/jre
>> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64",
>> family: "unix"
>> $MAHOUT_LOCAL=true
>> Hadoop 2.2.0
>>
>>
>> a) Verify that u can unpack the release (tar or zip) ...passed (tar)
>> [passed ]
>>
>> b) Verify u r able to compile the distro
>>
>> mvn compile- [passed with warnings]
>>
>> [WARNING] Expected all dependencies to require Scala version: 2.9.3
>> [WARNING] org.apache.mahout:mahout-math-scala:0.9 requires scala
>> version: 2.9.3
>> [WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala
>> version: 2.9.2
>> [WARNING] Multiple versions of scala libraries detected!
>>
>> c) Run through the unit tests: mvn clean test
>> mvn clean test [passed]
>>
>> d) Run the
>> example scripts under $MAHOUT_HOME/examples/bin.
>> Please run through all the different options in each script
>>
>> Running example scripts with $MAHOUT_LOCAL=true
>>
>> ./cluster-syntheticcontrol.sh ->1 [works]
>> ./cluster-syntheticcontrol.sh ->2 [works]
>> ./cluster-syntheticcontrol.sh ->3 [works]
>>
>>
>> ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
>> [...]
>> WARNING: Unable to add class:
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>> java.lang.ClassNotFoundException:
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
>> at
>> java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:171)
>> at
>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>> at
>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>> Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
>>
>>
>> ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
>>
>> WARNING: Unable to add class:
>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>> java.lang.ClassNotFoundException:
>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:171)
>> at
>> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>> at
>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>> Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
>> WARNING: No
>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found on
>> classpath, will use command-line arguments only
>> Unknown program
>> 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.
>>
>>
>> ./classify-20newsgroups.sh ->1 [works]
>> ./classify-20newsgroups.sh ->2 [works]
>>
>>
>> cluster-reuters.sh ->1 [works]
>> cluster-reuters.sh ->2 [works]
>> cluster-reuters.sh ->3 [works]
>>
>> Same error as noted previosly in the thread:
>>
>> cluster-reuters.sh ->4 [0 clusters]
>>
>> [...]
>>
>> WARNING: No qualcluster.props found on classpath, will use
>> command-line arguments only
>> Num clusters: 0; maxDistance: 0.000000
>> [Dunn Index]
>> First: Infinity
>> [Davies-Bouldin Index] First: NaN
>> Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Program took 669 ms (Minutes: 0.01115)
>> cluster,distance.mean,distance.sd
>> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>>
>>
>>
>>
>>
>>
>> > Date: Thu, 16 Jan 2014 06:41:09 -0800
>> > From: suneel_marthi@yahoo.com
>> > Subject: MAHOUT 0.9 Release - New URL
>> > To: user@mahout.apache.org; dev@mahout.apache.org
>> >
>> > Third time's a Charm!!!
>> >
>> >
>> > Here's the new URL for Mahout 0.9 Release:
>> >
>> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>> >
>> > For those volunteering to test this, some of the things to be verified:
>> >
>> > a) Verify that u can unpack the release (tar or zip)
>> > b) Verify u r able to compile the distro
>> > c) Run through the unit tests: mvn clean test
>> > d) Run the example scripts
>> under $MAHOUT_HOME/examples/bin. Please run through all the different
>> options in each script.
>> >
>> >
>> > Committers
>> > and PMC members:
>> > ---------------------------------------
>> >
>> > Need 'at least 3 +1 votes' for the Release to pass.
>> >
>> >
>> > Thanks and Regards.
>>
>
>
Re: MAHOUT 0.9 Release - New URL
Posted by Andrew Musselman <an...@gmail.com>.
Trying out the build today
On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <su...@yahoo.com>wrote:
> This is an issue (trivial one though) that needs to be fixed for 0.9
> Release, will be rerolling the release today (in the next few hrs) and
> putting out a new release candidate in staging.
>
> Thanks for reporting this Andrew P.
>
>
>
>
>
> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <ap...@outlook.com>
> wrote:
>
> I ran through the tests with on a CentOS VM AMD64 2 cores 4 GB RAM. Had a
> bit of trouble getting the Hadoop natives to compile and therefore may have
> run into some problems because of the hadoop setup. Ran into some problems
> in the example scripts. Particularly with ./cluster-syntheticcontrol.sh
> ->4,5. I will run through the rest of the examples when im sure I've got
> hadoop setup right.
>
>
> Apache Maven 3.1.2-SNAPSHOT
> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> Java home: /usr/java/jdk1.6.0_45/jre
> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64",
> family: "unix"
> $MAHOUT_LOCAL=true
> Hadoop 2.2.0
>
>
> a) Verify that u can unpack the release (tar or zip) ...passed (tar)
> [passed ]
>
> b) Verify u r able to compile the distro
>
> mvn compile- [passed with warnings]
>
> [WARNING] Expected all dependencies to require Scala version: 2.9.3
> [WARNING] org.apache.mahout:mahout-math-scala:0.9 requires scala
> version: 2.9.3
> [WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala version:
> 2.9.2
> [WARNING] Multiple versions of scala libraries detected!
>
> c) Run through the unit tests: mvn clean test
> mvn clean test [passed]
>
> d) Run the
> example scripts under $MAHOUT_HOME/examples/bin.
> Please run through all the different options in each script
>
> Running example scripts with $MAHOUT_LOCAL=true
>
> ./cluster-syntheticcontrol.sh ->1 [works]
> ./cluster-syntheticcontrol.sh ->2 [works]
> ./cluster-syntheticcontrol.sh ->3 [works]
>
>
> ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
> [...]
> WARNING: Unable to add class:
> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> java.lang.ClassNotFoundException:
> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> at
> java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:171)
> at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
>
>
> ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
>
> WARNING: Unable to add class:
> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> java.lang.ClassNotFoundException:
> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:171)
> at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
> WARNING: No
> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found on
> classpath, will use command-line arguments only
> Unknown program
> 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.
>
>
> ./classify-20newsgroups.sh ->1 [works]
> ./classify-20newsgroups.sh ->2 [works]
>
>
> cluster-reuters.sh ->1 [works]
> cluster-reuters.sh ->2 [works]
> cluster-reuters.sh ->3 [works]
>
> Same error as noted previosly in the thread:
>
> cluster-reuters.sh ->4 [0 clusters]
>
> [...]
>
> WARNING: No qualcluster.props found on classpath, will use
> command-line arguments only
> Num clusters: 0; maxDistance: 0.000000
> [Dunn Index]
> First: Infinity
> [Davies-Bouldin Index] First: NaN
> Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 669 ms (Minutes: 0.01115)
> cluster,distance.mean,distance.sd
> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
>
>
>
>
>
>
> > Date: Thu, 16 Jan 2014 06:41:09 -0800
> > From: suneel_marthi@yahoo.com
> > Subject: MAHOUT 0.9 Release - New URL
> > To: user@mahout.apache.org; dev@mahout.apache.org
> >
> > Third time's a Charm!!!
> >
> >
> > Here's the new URL for Mahout 0.9 Release:
> >
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> >
> > For those volunteering to test this, some of the things to be verified:
> >
> > a) Verify that u can unpack the release (tar or zip)
> > b) Verify u r able to compile the distro
> > c) Run through the unit tests: mvn clean test
> > d) Run the example scripts
> under $MAHOUT_HOME/examples/bin. Please run through all the different
> options in each script.
> >
> >
> > Committers
> > and PMC members:
> > ---------------------------------------
> >
> > Need 'at least 3 +1 votes' for the Release to pass.
> >
> >
> > Thanks and Regards.
>
Re: MAHOUT 0.9 Release - New URL
Posted by Suneel Marthi <su...@yahoo.com>.
This is an issue (trivial one though) that needs to be fixed for 0.9 Release, will be rerolling the release today (in the next few hrs) and putting out a new release candidate in staging.
Thanks for reporting this Andrew P.
On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <ap...@outlook.com> wrote:
I ran through the tests with on a CentOS VM AMD64 2 cores 4 GB RAM. Had a bit of trouble getting the Hadoop natives to compile and therefore may have run into some problems because of the hadoop setup. Ran into some problems in the example scripts. Particularly with ./cluster-syntheticcontrol.sh ->4,5. I will run through the rest of the examples when im sure I've got hadoop setup right.
Apache Maven 3.1.2-SNAPSHOT
Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
Java home: /usr/java/jdk1.6.0_45/jre
OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64", family: "unix"
$MAHOUT_LOCAL=true
Hadoop 2.2.0
a) Verify that u can unpack the release (tar or zip) ...passed (tar) [passed ]
b) Verify u r able to compile the distro
mvn compile- [passed with warnings]
[WARNING] Expected all dependencies to require Scala version: 2.9.3
[WARNING] org.apache.mahout:mahout-math-scala:0.9 requires scala version: 2.9.3
[WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala version: 2.9.2
[WARNING] Multiple versions of scala libraries detected!
c) Run through the unit tests: mvn clean test
mvn clean test [passed]
d) Run the
example scripts under $MAHOUT_HOME/examples/bin.
Please run through all the different options in each script
Running example scripts with $MAHOUT_LOCAL=true
./cluster-syntheticcontrol.sh ->1 [works]
./cluster-syntheticcontrol.sh ->2 [works]
./cluster-syntheticcontrol.sh ->3 [works]
./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
[...]
WARNING: Unable to add class: org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
java.lang.ClassNotFoundException: org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
at
java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:171)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at
org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
WARNING: Unable to add class: org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
java.lang.ClassNotFoundException: org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:171)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
WARNING: No org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found on classpath, will use command-line arguments only
Unknown program
'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.
./classify-20newsgroups.sh ->1 [works]
./classify-20newsgroups.sh ->2 [works]
cluster-reuters.sh ->1 [works]
cluster-reuters.sh ->2 [works]
cluster-reuters.sh ->3 [works]
Same error as noted previosly in the thread:
cluster-reuters.sh ->4 [0 clusters]
[...]
WARNING: No qualcluster.props found on classpath, will use command-line arguments only
Num clusters: 0; maxDistance: 0.000000
[Dunn Index]
First: Infinity
[Davies-Bouldin Index] First: NaN
Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 669 ms (Minutes: 0.01115)
cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> Date: Thu, 16 Jan 2014 06:41:09 -0800
> From: suneel_marthi@yahoo.com
> Subject: MAHOUT 0.9 Release - New URL
> To: user@mahout.apache.org; dev@mahout.apache.org
>
> Third time's a Charm!!!
>
>
> Here's the new URL for Mahout 0.9 Release:
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>
> For those volunteering to test this, some of the things to be verified:
>
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c) Run through the unit tests: mvn clean test
> d) Run the example scripts
under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
>
>
> Committers
> and PMC members:
> ---------------------------------------
>
> Need 'at least 3 +1 votes' for the Release to pass.
>
>
> Thanks and Regards.
Re: MAHOUT 0.9 Release - New URL
Posted by Suneel Marthi <su...@yahoo.com>.
This is an issue (trivial one though) that needs to be fixed for 0.9 Release, will be rerolling the release today (in the next few hrs) and putting out a new release candidate in staging.
Thanks for reporting this Andrew P.
On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <ap...@outlook.com> wrote:
I ran through the tests with on a CentOS VM AMD64 2 cores 4 GB RAM. Had a bit of trouble getting the Hadoop natives to compile and therefore may have run into some problems because of the hadoop setup. Ran into some problems in the example scripts. Particularly with ./cluster-syntheticcontrol.sh ->4,5. I will run through the rest of the examples when im sure I've got hadoop setup right.
Apache Maven 3.1.2-SNAPSHOT
Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
Java home: /usr/java/jdk1.6.0_45/jre
OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64", family: "unix"
$MAHOUT_LOCAL=true
Hadoop 2.2.0
a) Verify that u can unpack the release (tar or zip) ...passed (tar) [passed ]
b) Verify u r able to compile the distro
mvn compile- [passed with warnings]
[WARNING] Expected all dependencies to require Scala version: 2.9.3
[WARNING] org.apache.mahout:mahout-math-scala:0.9 requires scala version: 2.9.3
[WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala version: 2.9.2
[WARNING] Multiple versions of scala libraries detected!
c) Run through the unit tests: mvn clean test
mvn clean test [passed]
d) Run the
example scripts under $MAHOUT_HOME/examples/bin.
Please run through all the different options in each script
Running example scripts with $MAHOUT_LOCAL=true
./cluster-syntheticcontrol.sh ->1 [works]
./cluster-syntheticcontrol.sh ->2 [works]
./cluster-syntheticcontrol.sh ->3 [works]
./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
[...]
WARNING: Unable to add class: org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
java.lang.ClassNotFoundException: org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
at
java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:171)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at
org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
WARNING: Unable to add class: org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
java.lang.ClassNotFoundException: org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:171)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
WARNING: No org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found on classpath, will use command-line arguments only
Unknown program
'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.
./classify-20newsgroups.sh ->1 [works]
./classify-20newsgroups.sh ->2 [works]
cluster-reuters.sh ->1 [works]
cluster-reuters.sh ->2 [works]
cluster-reuters.sh ->3 [works]
Same error as noted previosly in the thread:
cluster-reuters.sh ->4 [0 clusters]
[...]
WARNING: No qualcluster.props found on classpath, will use command-line arguments only
Num clusters: 0; maxDistance: 0.000000
[Dunn Index]
First: Infinity
[Davies-Bouldin Index] First: NaN
Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 669 ms (Minutes: 0.01115)
cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> Date: Thu, 16 Jan 2014 06:41:09 -0800
> From: suneel_marthi@yahoo.com
> Subject: MAHOUT 0.9 Release - New URL
> To: user@mahout.apache.org; dev@mahout.apache.org
>
> Third time's a Charm!!!
>
>
> Here's the new URL for Mahout 0.9 Release:
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>
> For those volunteering to test this, some of the things to be verified:
>
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c) Run through the unit tests: mvn clean test
> d) Run the example scripts
under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
>
>
> Committers
> and PMC members:
> ---------------------------------------
>
> Need 'at least 3 +1 votes' for the Release to pass.
>
>
> Thanks and Regards.
Re: MAHOUT 0.9 Release - New URL
Posted by Suneel Marthi <su...@yahoo.com>.
Hmmm... that's an issue. Since both Dirichlet and Meanshift clustering have been removed from 0.9, cluster-syntheticcontrol.sh options 4,5 are not gonna work and should have been removed for 0.9.
To PMC,
-> rollback the release, fix this issue (and other patches that were submitted in the last few days) and put out another release ?
On Monday, January 20, 2014 12:33 AM, Andrew Palumbo <ap...@outlook.com> wrote:
I ran through the tests with on a CentOS VM AMD64 2 cores 4 GB RAM. Had a bit of trouble getting the Hadoop natives to compile and therefore may have run into some problems because of the hadoop setup. Ran into some problems in the example scripts. Particularly with ./cluster-syntheticcontrol.sh ->4,5. I will run through the rest of the examples when im sure I've got hadoop setup right.
Apache Maven 3.1.2-SNAPSHOT
Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
Java home: /usr/java/jdk1.6.0_45/jre
OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64", family: "unix"
$MAHOUT_LOCAL=true
Hadoop 2.2.0
a) Verify that u can unpack the release (tar or zip) ...passed (tar) [passed ]
b) Verify u r able to compile the distro
mvn compile- [passed with warnings]
[WARNING] Expected all dependencies to require Scala version: 2.9.3
[WARNING] org.apache.mahout:mahout-math-scala:0.9 requires scala version: 2.9.3
[WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala version: 2.9.2
[WARNING] Multiple versions of scala libraries detected!
c) Run through the unit tests: mvn clean test
mvn clean test [passed]
d) Run the example scripts under $MAHOUT_HOME/examples/bin.
Please run through all the different options in each script
Running example scripts with $MAHOUT_LOCAL=true
./cluster-syntheticcontrol.sh ->1 [works]
./cluster-syntheticcontrol.sh ->2 [works]
./cluster-syntheticcontrol.sh ->3 [works]
./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
[...]
WARNING: Unable to add class: org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
java.lang.ClassNotFoundException: org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:171)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
WARNING: Unable to add class: org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
java.lang.ClassNotFoundException: org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:171)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
WARNING: No org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found on classpath, will use command-line arguments only
Unknown program 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.
./classify-20newsgroups.sh ->1 [works]
./classify-20newsgroups.sh ->2 [works]
cluster-reuters.sh ->1 [works]
cluster-reuters.sh ->2 [works]
cluster-reuters.sh ->3 [works]
Same error as noted previosly in the thread:
cluster-reuters.sh ->4 [0 clusters]
[...]
WARNING: No qualcluster.props found on classpath, will use command-line arguments only
Num clusters: 0; maxDistance: 0.000000
[Dunn Index] First: Infinity
[Davies-Bouldin Index] First: NaN
Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 669 ms (Minutes: 0.01115)
cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> Date: Thu, 16 Jan 2014 06:41:09 -0800
> From: suneel_marthi@yahoo.com
> Subject: MAHOUT 0.9 Release - New URL
> To: user@mahout.apache.org; dev@mahout.apache.org
>
> Third time's a Charm!!!
>
>
> Here's the new URL for Mahout 0.9 Release:
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>
> For those volunteering to test this, some of the things to be verified:
>
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c) Run through the unit tests: mvn clean test
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
>
>
> Committers
> and PMC members:
> ---------------------------------------
>
> Need 'at least 3 +1 votes' for the Release to pass.
>
>
> Thanks and Regards.
RE: MAHOUT 0.9 Release - New URL
Posted by Andrew Palumbo <ap...@outlook.com>.
I ran through the tests with on a CentOS VM AMD64 2 cores 4 GB RAM. Had a bit of trouble getting the Hadoop natives to compile and therefore may have run into some problems because of the hadoop setup. Ran into some problems in the example scripts. Particularly with ./cluster-syntheticcontrol.sh ->4,5. I will run through the rest of the examples when im sure I've got hadoop setup right.
Apache Maven 3.1.2-SNAPSHOT
Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
Java home: /usr/java/jdk1.6.0_45/jre
OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64", family: "unix"
$MAHOUT_LOCAL=true
Hadoop 2.2.0
a) Verify that u can unpack the release (tar or zip) ...passed (tar) [passed ]
b) Verify u r able to compile the distro
mvn compile- [passed with warnings]
[WARNING] Expected all dependencies to require Scala version: 2.9.3
[WARNING] org.apache.mahout:mahout-math-scala:0.9 requires scala version: 2.9.3
[WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala version: 2.9.2
[WARNING] Multiple versions of scala libraries detected!
c) Run through the unit tests: mvn clean test
mvn clean test [passed]
d) Run the example scripts under $MAHOUT_HOME/examples/bin.
Please run through all the different options in each script
Running example scripts with $MAHOUT_LOCAL=true
./cluster-syntheticcontrol.sh ->1 [works]
./cluster-syntheticcontrol.sh ->2 [works]
./cluster-syntheticcontrol.sh ->3 [works]
./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
[...]
WARNING: Unable to add class: org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
java.lang.ClassNotFoundException: org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:171)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
WARNING: Unable to add class: org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
java.lang.ClassNotFoundException: org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:171)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
WARNING: No org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props found on classpath, will use command-line arguments only
Unknown program 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' chosen.
./classify-20newsgroups.sh ->1 [works]
./classify-20newsgroups.sh ->2 [works]
cluster-reuters.sh ->1 [works]
cluster-reuters.sh ->2 [works]
cluster-reuters.sh ->3 [works]
Same error as noted previosly in the thread:
cluster-reuters.sh ->4 [0 clusters]
[...]
WARNING: No qualcluster.props found on classpath, will use command-line arguments only
Num clusters: 0; maxDistance: 0.000000
[Dunn Index] First: Infinity
[Davies-Bouldin Index] First: NaN
Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 669 ms (Minutes: 0.01115)
cluster,distance.mean,distance.sd,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> Date: Thu, 16 Jan 2014 06:41:09 -0800
> From: suneel_marthi@yahoo.com
> Subject: MAHOUT 0.9 Release - New URL
> To: user@mahout.apache.org; dev@mahout.apache.org
>
> Third time's a Charm!!!
>
>
> Here's the new URL for Mahout 0.9 Release:
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
>
> For those volunteering to test this, some of the things to be verified:
>
> a) Verify that u can unpack the release (tar or zip)
> b) Verify u r able to compile the distro
> c) Run through the unit tests: mvn clean test
> d) Run the example scripts under $MAHOUT_HOME/examples/bin. Please run through all the different options in each script.
>
>
> Committers
> and PMC members:
> ---------------------------------------
>
> Need 'at least 3 +1 votes' for the Release to pass.
>
>
> Thanks and Regards.