You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/01/09 13:56:39 UTC

[jira] [Resolved] (SPARK-4510) Add k-medoids Partitioning Around Medoids (PAM) algorithm

     [ https://issues.apache.org/jira/browse/SPARK-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-4510.
------------------------------
    Resolution: Won't Fix

> Add k-medoids Partitioning Around Medoids (PAM) algorithm
> ---------------------------------------------------------
>
>                 Key: SPARK-4510
>                 URL: https://issues.apache.org/jira/browse/SPARK-4510
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Fan Jiang
>            Assignee: Fan Jiang
>              Labels: clustering, features
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> PAM (k-medoids) is more robust to noise and outliers as compared to k-means because it minimizes a sum of pairwise dissimilarities instead of a sum of squared Euclidean distances. A medoid can be defined as the object of a cluster, whose average dissimilarity to all the objects in the cluster is minimal i.e. it is a most centrally located point in the cluster.
> The most common realisation of k-medoid clustering is the Partitioning Around Medoids (PAM) algorithm and is as follows:
> Initialize: randomly select (without replacement) k of the n data points as the medoids
> Associate each data point to the closest medoid. ("closest" here is defined using any valid distance metric, most commonly Euclidean distance, Manhattan distance or Minkowski distance)
> For each medoid m
> For each non-medoid data point o
> Swap m and o and compute the total cost of the configuration
> Select the configuration with the lowest cost.
> Repeat steps 2 to 4 until there is no change in the medoid.
> The new feature for MLlib will contain 5 new files
> /main/scala/org/apache/spark/mllib/clustering/PAM.scala
> /main/scala/org/apache/spark/mllib/clustering/PAMModel.scala
> /main/scala/org/apache/spark/mllib/clustering/LocalPAM.scala
> /test/scala/org/apache/spark/mllib/clustering/PAMSuite.scala
> /main/scala/org/apache/spark/examples/mllib/KMedoids.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org