You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Timothy Potter <th...@gmail.com> on 2011/04/07 19:32:49 UTC

Re: Need a little help with using SVD

Thanks for all the help! Was finally able to go through the process of
running SVD job and then preparing the output for clustering using
matrixmult. Patch has been posted for MAHOUT-639. Also, I formalized this
thread into an example in the wiki, see the "Example: SVD of ASF Mail
Archives on Amazon Elastic MapReduce" at
https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction

My next step is to run the k-Means job on the new matrix and then start
working on the task of creating a "clean command line integration from text
=> hashed vector => clusters" ...

Tim

On Wed, Mar 30, 2011 at 3:39 PM, Lance Norskog <go...@gmail.com> wrote:

> Also, many Vector implementations have their own Element class. Will
> each need a custom comparator?
>
> On Tue, Mar 29, 2011 at 7:53 PM, Jake Mannix <ja...@gmail.com>
> wrote:
> > Hmmm... maybe I'm being paranoid, but iirc, a custom Comparator on
> > Vector.Element instances will fail to produce correct results because
> > Vector.iterate() reuses Element object instances.  It'll be fast, yes,
> but
> > does your code pass the current unit tests?
> >
> > I ask because I think I've tried this before... :)
> >
> > On Mar 29, 2011 5:39 PM, "Timothy Potter" <th...@gmail.com> wrote:
> >
> > Hi Jake,
> >
> > Success! I implemented a basic Element Comparator and sorted the random
> > vector data before adding to the SequentialAccessSparseVector as you
> > recommended and the TransposeJob ripped through my data in about 5 mins!
> My
> > implementation is basic at this point relying on a custom
> > Comparator<Element> and Arrays.sort() vs. the optimal way you suggested,
> but
> > gets the job done and is actually pretty fast ... I'll post a patch after
> > I've added some test cases for this.
> >
> > Thanks again for your help.
> >
> > Cheers,
> > Tim
> >
> > On Tue, Mar 29, 2011 at 7:29 AM, Jake Mannix <ja...@gmail.com>
> wrote:
> >> > Riding in the cab ...
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>