You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Sisir Koppaka <si...@gmail.com> on 2010/05/02 21:42:29 UTC

Quickstart for kMeans

For GSOC students,
In case anyone was going through the code and finding some difficulty in
running stuff, I have updated the kMeans page on the
wiki<https://cwiki.apache.org/confluence/display/MAHOUT/k-Means> with
a short quickstart shell script that will run it for you. You can tweak the
settings and reuse it. Reading the code after running it will hopefully help
out in understanding the codebase well.

If any of you have any tips to share, or have made notes of
quirks-to-be-aware-of, do post them here for everyone's benefit.

Re: Quickstart for kMeans

Posted by Sisir Koppaka <si...@gmail.com>.
Hi,
I've put up a slightly cleaner version of the script on JIRA at
https://issues.apache.org/jira/browse/MAHOUT-390

Best regards,
Sisir Koppaka

On Mon, May 3, 2010 at 11:28 PM, Grant Ingersoll <gs...@apache.org>wrote:

> Sisir,
>
> Thanks for the script.  I think it would be great to open a JIRA issue for
> this and we can check in the shell script under the examples.
>
> I think LDA also has similar tools to download Reuters, we should try to
> reuse if possible.
>
> On May 2, 2010, at 3:42 PM, Sisir Koppaka wrote:
>
> > For GSOC students,
> > In case anyone was going through the code and finding some difficulty in
> > running stuff, I have updated the kMeans page on the
> > wiki<https://cwiki.apache.org/confluence/display/MAHOUT/k-Means> with
> > a short quickstart shell script that will run it for you. You can tweak
> the
> > settings and reuse it. Reading the code after running it will hopefully
> help
> > out in understanding the codebase well.
> >
> > If any of you have any tips to share, or have made notes of
> > quirks-to-be-aware-of, do post them here for everyone's benefit.
>
>
>

Re: Quickstart for kMeans

Posted by Grant Ingersoll <gs...@apache.org>.
Sisir,

Thanks for the script.  I think it would be great to open a JIRA issue for this and we can check in the shell script under the examples.

I think LDA also has similar tools to download Reuters, we should try to reuse if possible.

On May 2, 2010, at 3:42 PM, Sisir Koppaka wrote:

> For GSOC students,
> In case anyone was going through the code and finding some difficulty in
> running stuff, I have updated the kMeans page on the
> wiki<https://cwiki.apache.org/confluence/display/MAHOUT/k-Means> with
> a short quickstart shell script that will run it for you. You can tweak the
> settings and reuse it. Reading the code after running it will hopefully help
> out in understanding the codebase well.
> 
> If any of you have any tips to share, or have made notes of
> quirks-to-be-aware-of, do post them here for everyone's benefit.



Re: Quickstart for kMeans

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Indeed, the wiki is pretty out of date in some areas and the actual apis 
have changed (since 2008!). For users wishing to launch clustering jobs 
using trunk I suggest checking out utils TestCDbwEvaluator and 
TestClusterDumper which employ the latest versions. These do not use the 
command-line for execution but use the runJob methods in the respective 
driver classes. Once I get my wiki karma back I will go over them all 
again and update for consistency.

On 5/2/10 12:42 PM, Sisir Koppaka wrote:
> For GSOC students,
> In case anyone was going through the code and finding some difficulty in
> running stuff, I have updated the kMeans page on the
> wiki<https://cwiki.apache.org/confluence/display/MAHOUT/k-Means>  with
> a short quickstart shell script that will run it for you. You can tweak the
> settings and reuse it. Reading the code after running it will hopefully help
> out in understanding the codebase well.
>
> If any of you have any tips to share, or have made notes of
> quirks-to-be-aware-of, do post them here for everyone's benefit.
>
>    


Re: Quickstart for kMeans

Posted by Sisir Koppaka <si...@gmail.com>.
Two more useful resources for quickstarting with the code -
http://lucene.grantingersoll.com/2010/02/16/trijug-intro-to-mahout-slides-and-demo-examples/
http://www.lucenebootcamp.com/lucene-boot-camp-preclass-training/

On Mon, May 3, 2010 at 1:14 AM, Robin Anil <ro...@gmail.com> wrote:

> Nice work!
>
> On Mon, May 3, 2010 at 1:12 AM, Sisir Koppaka <sisir.koppaka@gmail.com
> >wrote:
>
> > For GSOC students,
> > In case anyone was going through the code and finding some difficulty in
> > running stuff, I have updated the kMeans page on the
> > wiki<https://cwiki.apache.org/confluence/display/MAHOUT/k-Means> with
> > a short quickstart shell script that will run it for you. You can tweak
> the
> > settings and reuse it. Reading the code after running it will hopefully
> > help
> > out in understanding the codebase well.
> >
> > If any of you have any tips to share, or have made notes of
> > quirks-to-be-aware-of, do post them here for everyone's benefit.
> >
>

Re: Quickstart for kMeans

Posted by Robin Anil <ro...@gmail.com>.
Nice work!

On Mon, May 3, 2010 at 1:12 AM, Sisir Koppaka <si...@gmail.com>wrote:

> For GSOC students,
> In case anyone was going through the code and finding some difficulty in
> running stuff, I have updated the kMeans page on the
> wiki<https://cwiki.apache.org/confluence/display/MAHOUT/k-Means> with
> a short quickstart shell script that will run it for you. You can tweak the
> settings and reuse it. Reading the code after running it will hopefully
> help
> out in understanding the codebase well.
>
> If any of you have any tips to share, or have made notes of
> quirks-to-be-aware-of, do post them here for everyone's benefit.
>