You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Chirag Lakhani <cl...@zaloni.com> on 2014/03/02 19:31:06 UTC

sparsification of a Mahout vector

Hi,

I was wondering if there is a simple way to sparsify a vector in Mahout.  I
basically have an n-dimensional vector (currently a DenseVector) and I want
to develop a method that sparsifies it by keeping only the largest s values
of the vector and setting the rest to 0.  Is there a simple solution to
this given all that is included in the Vector class or do I need to create
my own method?

Chirag

-- 

*Chirag Lakhani*

Data Scientist

Zaloni, Inc. | www.zaloni.com

633 Davis Dr., Suite 200

Durham, NC 27713
e: clakhani@zaloni.com
p: 919.602.4965 x7020

Re: sparsification of a Mahout vector

Posted by Ted Dunning <te...@gmail.com>.
Chirag,

There isn't a fully baked answer to your needs, but there are components
that can help you.  For instance, the OnlineSummarizer can help you find a
particular quantile.  Iterating over the vector to fill that is easy enough:

For example:

        Vector v;  // original data
        OnlineSummarizer s = new OnlineSummarizer();
        for (Vector.Element e : v.all()) {
            s.add(e.get());
        }

        // pick any cutoff you like
        double cutoff = s.quantile(0.99);

Then you can use this cutoff to copy only the items you need:

        Vector r = new RandomAccessSparseVector(v.size());
        for (Vector.Element e : v.all()) {
            double vi = e.get();
            if (vi > cutoff) {
                r.set(e.index(), vi);
            }
        }

Note that if you really want a sparse result, you really have to perform a
selective copy because even if you set elements of a DenseVector to zero,
you still will have the same amount of storage.  Only by copying
selectively to a new vector with the right type can you get the desired
effect.





On Sun, Mar 2, 2014 at 10:31 AM, Chirag Lakhani <cl...@zaloni.com> wrote:

> Hi,
>
> I was wondering if there is a simple way to sparsify a vector in Mahout.  I
> basically have an n-dimensional vector (currently a DenseVector) and I want
> to develop a method that sparsifies it by keeping only the largest s values
> of the vector and setting the rest to 0.  Is there a simple solution to
> this given all that is included in the Vector class or do I need to create
> my own method?
>
> Chirag
>
> --
>
> *Chirag Lakhani*
>
> Data Scientist
>
> Zaloni, Inc. | www.zaloni.com
>
> 633 Davis Dr., Suite 200
>
> Durham, NC 27713
> e: clakhani@zaloni.com
> p: 919.602.4965 x7020
>