You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by "Niederberger Thomas (tniederb@hsr.ch)" <tn...@hsr.ch> on 2013/10/01 16:34:59 UTC

K-SVD

Hello everyone,
I'm currently working on my master thesis in compressed sensing for video.
Part of my project might be to learn a dictionary from a large collection of videos. The typical algorithm to use for this task is called K-SVD.
K-SVD is very similar to K-Means but there are two key differences:
 - A data point is assigned to several 'clusters' (not one)
 - Instead of finding the cluster by taking the mean of all data points the cluster is found by finding the first principal component using an SVD (hence the name).
The original reference for the algorithm is "K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation" from M. Aharon.

It seems no one has implemented this algorithm yet on Mahout.
Since I have no experience with Mahout/Hadoop I wondered if it would be difficult to implement this based on the available implementation of K-Means?
Is anybody interested in this and could point me towards the right direction for an implementation?

Best,
Thomas

Re: K-SVD

Posted by Ted Dunning <te...@gmail.com>.

Yes.  This should be relatively easy to build in Mahout.

There are several decent courses of action:

1) use the streaming k-means to reduce the data to a sketch of the original
data and then apply an in-memory k-svd algorithm to that.  This could be
really, really fast and has a reasonable chance of working well.

2) pull the streaming k-means apart and use the approximate search
mechanisms in there to implement a new version of k-svd.

3) build an iterative implementation on top of Spark or Giraph using Mahout
vectors.

One reservation that I might have is that if your data are dense, you might
have better performance using something like JBlas.  Mahout works well for
sparse data, but is a bit sub-par on dense performance.

On Tue, Oct 1, 2013 at 7:34 AM, Niederberger Thomas (tniederb@hsr.ch) <
tniederb@hsr.ch> wrote:

> Hello everyone,
> I'm currently working on my master thesis in compressed sensing for video.
> Part of my project might be to learn a dictionary from a large collection
> of videos. The typical algorithm to use for this task is called K-SVD.
> K-SVD is very similar to K-Means but there are two key differences:
>  - A data point is assigned to several 'clusters' (not one)
>  - Instead of finding the cluster by taking the mean of all data points
> the cluster is found by finding the first principal component using an SVD
> (hence the name).
> The original reference for the algorithm is "K-SVD: An Algorithm for
> Designing Overcomplete Dictionaries for Sparse Representation" from M.
> Aharon.
>
> It seems no one has implemented this algorithm yet on Mahout.
> Since I have no experience with Mahout/Hadoop I wondered if it would be
> difficult to implement this based on the available implementation of
> K-Means?
> Is anybody interested in this and could point me towards the right
> direction for an implementation?
>
> Best,
> Thomas
>