You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Pallavi Palleti (JIRA)" <ji...@apache.org> on 2008/08/11 11:35:50 UTC

[jira] Created: (MAHOUT-74) Fuzzy K-Means clustering

Fuzzy K-Means clustering
------------------------

                 Key: MAHOUT-74
                 URL: https://issues.apache.org/jira/browse/MAHOUT-74
             Project: Mahout
          Issue Type: New Feature
            Reporter: Pallavi Palleti
         Attachments: mahout-74.patch

Fuzzy KMeans clustering algorithm is an extension to traditional K Means clustering algorithm and performs soft clustering.

More details about fuzzy k-means can be found here :http://en.wikipedia.org/wiki/Data_clustering#Fuzzy_c-means_clustering

I have implemented fuzzy K-Means prototype and tests in org.apache.mahout.clustering.fuzzykmeans

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


RE: [jira] Updated: (MAHOUT-74) Fuzzy K-Means clustering

Posted by "Palleti, Pallavi" <pa...@corp.aol.com>.
Great. Thanks Grant for the modifications.

-----Original Message-----
From: Grant Ingersoll (JIRA) [mailto:jira@apache.org] 
Sent: Thursday, August 21, 2008 7:09 PM
To: mahout-dev@lucene.apache.org
Subject: [jira] Updated: (MAHOUT-74) Fuzzy K-Means clustering


     [ https://issues.apache.org/jira/browse/MAHOUT-74?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated MAHOUT-74:
----------------------------------

    Attachment: MAHOUT-74.patch

Looking pretty good, Pallavi.  I modified it slightly so that m is set just via the JobConf like the other values.  I think we are in pretty good shape and I will commit soon.  I also made m a float.  Looking at the wiki link you have there, I don't see any reason why m should be restricted to an int.

> Fuzzy K-Means clustering
> ------------------------
>
>                 Key: MAHOUT-74
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-74
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>            Reporter: Pallavi Palleti
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: MAHOUT-74.patch, MAHOUT-74.patch, mahout-74.patch, mahout-74.patch
>
>
> Fuzzy KMeans clustering algorithm is an extension to traditional K Means clustering algorithm and performs soft clustering.
> More details about fuzzy k-means can be found here :http://en.wikipedia.org/wiki/Data_clustering#Fuzzy_c-means_clustering
> I have implemented fuzzy K-Means prototype and tests in org.apache.mahout.clustering.fuzzykmeans

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-74) Fuzzy K-Means clustering

Posted by "Pallavi Palleti (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-74?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pallavi Palleti updated MAHOUT-74:
----------------------------------

    Component/s: Clustering

> Fuzzy K-Means clustering
> ------------------------
>
>                 Key: MAHOUT-74
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-74
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>            Reporter: Pallavi Palleti
>         Attachments: mahout-74.patch
>
>
> Fuzzy KMeans clustering algorithm is an extension to traditional K Means clustering algorithm and performs soft clustering.
> More details about fuzzy k-means can be found here :http://en.wikipedia.org/wiki/Data_clustering#Fuzzy_c-means_clustering
> I have implemented fuzzy K-Means prototype and tests in org.apache.mahout.clustering.fuzzykmeans

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (MAHOUT-74) Fuzzy K-Means clustering

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-74?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll resolved MAHOUT-74.
-----------------------------------

    Resolution: Fixed

> Fuzzy K-Means clustering
> ------------------------
>
>                 Key: MAHOUT-74
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-74
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>            Reporter: Pallavi Palleti
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: MAHOUT-74.patch, MAHOUT-74.patch, mahout-74.patch, mahout-74.patch
>
>
> Fuzzy KMeans clustering algorithm is an extension to traditional K Means clustering algorithm and performs soft clustering.
> More details about fuzzy k-means can be found here :http://en.wikipedia.org/wiki/Data_clustering#Fuzzy_c-means_clustering
> I have implemented fuzzy K-Means prototype and tests in org.apache.mahout.clustering.fuzzykmeans

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-74) Fuzzy K-Means clustering

Posted by "Pallavi Palleti (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-74?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pallavi Palleti updated MAHOUT-74:
----------------------------------

    Attachment: mahout-74.patch

I have implemented Fuzzy K-Means prototype and tests. Please review the code.

> Fuzzy K-Means clustering
> ------------------------
>
>                 Key: MAHOUT-74
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-74
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Pallavi Palleti
>         Attachments: mahout-74.patch
>
>
> Fuzzy KMeans clustering algorithm is an extension to traditional K Means clustering algorithm and performs soft clustering.
> More details about fuzzy k-means can be found here :http://en.wikipedia.org/wiki/Data_clustering#Fuzzy_c-means_clustering
> I have implemented fuzzy K-Means prototype and tests in org.apache.mahout.clustering.fuzzykmeans

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-74) Fuzzy K-Means clustering

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-74?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623018#action_12623018 ] 

Grant Ingersoll commented on MAHOUT-74:
---------------------------------------

Couple of questions:

1. What's the urlCount for on SoftCluster?

2.  Shouldn't SoftCluster.m be non-final (and configurable.)

3.  It seems like there should be an opportunity for more inheritance/overlap, etc. w/ the K-Means clustering, but I'd have to think about it a bit more.  

The wikipedia article implies that m == 1 is "similar" to KMeans, is it the case that we could make KMeans just be a special case of fuzzy k means through the appropriate choosing of parameters?

Otherwise, the tests pass and it looks to be in pretty good shape.  Would be cool to have an example added, but not required for this patch to go in.



> Fuzzy K-Means clustering
> ------------------------
>
>                 Key: MAHOUT-74
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-74
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>            Reporter: Pallavi Palleti
>            Assignee: Grant Ingersoll
>         Attachments: mahout-74.patch, mahout-74.patch
>
>
> Fuzzy KMeans clustering algorithm is an extension to traditional K Means clustering algorithm and performs soft clustering.
> More details about fuzzy k-means can be found here :http://en.wikipedia.org/wiki/Data_clustering#Fuzzy_c-means_clustering
> I have implemented fuzzy K-Means prototype and tests in org.apache.mahout.clustering.fuzzykmeans

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-74) Fuzzy K-Means clustering

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-74?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated MAHOUT-74:
----------------------------------

    Attachment: MAHOUT-74.patch

Looking pretty good, Pallavi.  I modified it slightly so that m is set just via the JobConf like the other values.  I think we are in pretty good shape and I will commit soon.  I also made m a float.  Looking at the wiki link you have there, I don't see any reason why m should be restricted to an int.

> Fuzzy K-Means clustering
> ------------------------
>
>                 Key: MAHOUT-74
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-74
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>            Reporter: Pallavi Palleti
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: MAHOUT-74.patch, MAHOUT-74.patch, mahout-74.patch, mahout-74.patch
>
>
> Fuzzy KMeans clustering algorithm is an extension to traditional K Means clustering algorithm and performs soft clustering.
> More details about fuzzy k-means can be found here :http://en.wikipedia.org/wiki/Data_clustering#Fuzzy_c-means_clustering
> I have implemented fuzzy K-Means prototype and tests in org.apache.mahout.clustering.fuzzykmeans

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-74) Fuzzy K-Means clustering

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-74?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624904#action_12624904 ] 

Grant Ingersoll commented on MAHOUT-74:
---------------------------------------

Committed revision 688122.

> Fuzzy K-Means clustering
> ------------------------
>
>                 Key: MAHOUT-74
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-74
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>            Reporter: Pallavi Palleti
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: MAHOUT-74.patch, MAHOUT-74.patch, mahout-74.patch, mahout-74.patch
>
>
> Fuzzy KMeans clustering algorithm is an extension to traditional K Means clustering algorithm and performs soft clustering.
> More details about fuzzy k-means can be found here :http://en.wikipedia.org/wiki/Data_clustering#Fuzzy_c-means_clustering
> I have implemented fuzzy K-Means prototype and tests in org.apache.mahout.clustering.fuzzykmeans

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-74) Fuzzy K-Means clustering

Posted by "Pallavi Palleti (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-74?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623252#action_12623252 ] 

Pallavi Palleti commented on MAHOUT-74:
---------------------------------------

Hi Grant,
  urlCount is unnecessary variable. It got added mistakenly. 
  SoftCluster.m should be configurable. I am sorry. I forgot to modify it.




> Fuzzy K-Means clustering
> ------------------------
>
>                 Key: MAHOUT-74
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-74
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>            Reporter: Pallavi Palleti
>            Assignee: Grant Ingersoll
>         Attachments: mahout-74.patch, mahout-74.patch
>
>
> Fuzzy KMeans clustering algorithm is an extension to traditional K Means clustering algorithm and performs soft clustering.
> More details about fuzzy k-means can be found here :http://en.wikipedia.org/wiki/Data_clustering#Fuzzy_c-means_clustering
> I have implemented fuzzy K-Means prototype and tests in org.apache.mahout.clustering.fuzzykmeans

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-74) Fuzzy K-Means clustering

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-74?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated MAHOUT-74:
----------------------------------

    Attachment: mahout-74.patch

Here's an update the compiles against trunk.



> Fuzzy K-Means clustering
> ------------------------
>
>                 Key: MAHOUT-74
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-74
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>            Reporter: Pallavi Palleti
>            Assignee: Grant Ingersoll
>         Attachments: mahout-74.patch, mahout-74.patch
>
>
> Fuzzy KMeans clustering algorithm is an extension to traditional K Means clustering algorithm and performs soft clustering.
> More details about fuzzy k-means can be found here :http://en.wikipedia.org/wiki/Data_clustering#Fuzzy_c-means_clustering
> I have implemented fuzzy K-Means prototype and tests in org.apache.mahout.clustering.fuzzykmeans

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-74) Fuzzy K-Means clustering

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-74?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated MAHOUT-74:
----------------------------------

    Fix Version/s: 0.1
         Priority: Minor  (was: Major)

> Fuzzy K-Means clustering
> ------------------------
>
>                 Key: MAHOUT-74
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-74
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>            Reporter: Pallavi Palleti
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: mahout-74.patch, mahout-74.patch
>
>
> Fuzzy KMeans clustering algorithm is an extension to traditional K Means clustering algorithm and performs soft clustering.
> More details about fuzzy k-means can be found here :http://en.wikipedia.org/wiki/Data_clustering#Fuzzy_c-means_clustering
> I have implemented fuzzy K-Means prototype and tests in org.apache.mahout.clustering.fuzzykmeans

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (MAHOUT-74) Fuzzy K-Means clustering

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-74?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll reassigned MAHOUT-74:
-------------------------------------

    Assignee: Grant Ingersoll

> Fuzzy K-Means clustering
> ------------------------
>
>                 Key: MAHOUT-74
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-74
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>            Reporter: Pallavi Palleti
>            Assignee: Grant Ingersoll
>         Attachments: mahout-74.patch
>
>
> Fuzzy KMeans clustering algorithm is an extension to traditional K Means clustering algorithm and performs soft clustering.
> More details about fuzzy k-means can be found here :http://en.wikipedia.org/wiki/Data_clustering#Fuzzy_c-means_clustering
> I have implemented fuzzy K-Means prototype and tests in org.apache.mahout.clustering.fuzzykmeans

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-74) Fuzzy K-Means clustering

Posted by "Pallavi Palleti (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-74?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pallavi Palleti updated MAHOUT-74:
----------------------------------

    Attachment: MAHOUT-74.patch

Modified code to remove urlcount (an unnecessary variable) and made "m" configurable. Also made distance measure class "configurable"

> Fuzzy K-Means clustering
> ------------------------
>
>                 Key: MAHOUT-74
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-74
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>            Reporter: Pallavi Palleti
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 0.1
>
>         Attachments: MAHOUT-74.patch, mahout-74.patch, mahout-74.patch
>
>
> Fuzzy KMeans clustering algorithm is an extension to traditional K Means clustering algorithm and performs soft clustering.
> More details about fuzzy k-means can be found here :http://en.wikipedia.org/wiki/Data_clustering#Fuzzy_c-means_clustering
> I have implemented fuzzy K-Means prototype and tests in org.apache.mahout.clustering.fuzzykmeans

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.