You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Shannon Quinn (JIRA)" <ji...@apache.org> on 2010/10/03 01:19:33 UTC

[jira] Commented: (MAHOUT-518) Implement Affinity Preprocessing for Eigencuts and Spectral KMeans

    [ https://issues.apache.org/jira/browse/MAHOUT-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917263#action_12917263 ] 

Shannon Quinn commented on MAHOUT-518:
--------------------------------------

This is worth discussion: Eigencuts (and spectral clustering algorithms on a whole) is designed to work specifically on images, but in theory can be used for any general-purpose clustering if the data format is correct. The primary issue here, however, is that the input affinity matrix A currently must be symmetrical (and sparse, but that's the Mahout requirement). The symmetry is easy to do with images: the general rule of thumb is that for each pixel, the neighborhood of affinities consists of the 8 pixels around it, therefore making it both sparse and symmetric. Were these data points to be drawn from arbitrary distributions (say, a bunch of points in Euclidean space), you can picture instances where the neighborhoods of nearest data points aren't symmetric.

There are optimizations that can be made to convert non-symmetric input data into lower-rank approximations that are fully symmetric, but that's probably something we should tackle later (there's a section on this problem specifically in Dr. Chennubhotla's thesis containing Eigencuts, that's probably a good place to start). My recommendation for this ticket is to allow the algorithm to process raw images; generalized input data (which is not necessarily symmetric) can come later.

My only point regarding images is that for most academic purposes, this algorithm has used input images in PGM format; problem is, Java doesn't have a native PGM image processor, hence why I'm still tweaking the Eigencuts examples. If anyone knows of something that would help with this, please let me know. 

> Implement Affinity Preprocessing for Eigencuts and Spectral KMeans
> ------------------------------------------------------------------
>
>                 Key: MAHOUT-518
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-518
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.4
>            Reporter: Jeff Eastman
>             Fix For: 0.5
>
>
> The input format for these clustering algorithms is currently affinity tuples. It would be very nice to have this process automated. Marking for 0.5 as this will require some investigation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.