You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (Jira)" <ji...@apache.org> on 2019/10/01 19:09:00 UTC
[jira] [Closed] (MADLIB-1380) Select number of centroids in k-means
[ https://issues.apache.org/jira/browse/MADLIB-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Frank McQuillan closed MADLIB-1380.
-----------------------------------
> Select number of centroids in k-means
> -------------------------------------
>
> Key: MADLIB-1380
> URL: https://issues.apache.org/jira/browse/MADLIB-1380
> Project: Apache MADlib
> Issue Type: New Feature
> Components: Module: k-Means Clustering
> Reporter: Frank McQuillan
> Priority: Major
> Fix For: v1.17
>
>
> {code}
> kmeans_random( rel_source,
> expr_point,
> k, -- can be a single value like now or an array of k values
> fn_dist, -- optional
> agg_centroid, -- optional
> max_num_iterations, -- optional
> min_frac_reassigned, -- optional
> k_selection_algorithm -- optional (only applies if 'k' parameter is an array with multiple k values)
> )
> {code}
> {code}
> kmeanspp( rel_source,
> expr_point,
> k, -- can be a single value like now or an array of k values
> fn_dist, -- optional
> agg_centroid, -- optional
> max_num_iterations, -- optional
> min_frac_reassigned, -- optional
> seeding_sample_ratio, -- optional
> k_selection_algorithm -- optional (only applies if 'k' parameter is an array with multiple k values)
> )
> {code}
> {code}
> k
> INTEGER of INTEGER[]. The number of centroids to calculate. Can be a single value
> or an array of k values to explore. If array of k values given, the parameter 'k_selection_algorithm'
> determines the evaluation method.
> {code}
> {code}
> k_selection_algorithm (optional)
> TEXT, default: 'elbow'. Method to evaluate number of centroids k.
> Only applies if the parameter 'k' is an array with multiple k values.
> Currently two approaches are supported: 'elbow', and 'silhouette'.
> The text can be any subset of the strings; for e.g., 'silh' will use the silhouette method.
> {code}
> e.g.,
> {code}
> SELECT * FROM madlib.kmeanspp (
> 'km_sample', -- rel_source
> 'points', -- expr_point
> 'ARRAY[2, 4, 6, 8, 10]', -- k
> 'madlib.squared_dist_norm2', -- fn_dist
> 'madlib.avg', -- agg_centroid
> 20, -- max_num_iterations
> 0.001, -- min_frac_reassigned
> 'elbow' -- k_selection_algorithm
> );
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)