You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Amol Singh (JIRA)" <ji...@apache.org> on 2016/05/21 05:19:14 UTC

[jira] [Created] (MATH-1367) DBSCAN Implementation does not count the seed point itself as part of its neighbors count

Amol Singh created MATH-1367:
--------------------------------

             Summary: DBSCAN Implementation does not count the seed point itself as part of its neighbors count
                 Key: MATH-1367
                 URL: https://issues.apache.org/jira/browse/MATH-1367
             Project: Commons Math
          Issue Type: Bug
    Affects Versions: 3.6.1
            Reporter: Amol Singh
             Fix For: 4.0


The DSCAN paper describes the eps-neighborhood of a point as 

https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf (Page 2)
Definition 1: (Eps-neighborhood of a point) The Eps-neighborhood of a point p, denoted by NEps(p), is defined by NEps(p) = {q ∈ D | dist(p,q)< Eps}.

in other words for all q points that are a member of database D whose distance from p is less that Eps should be classified as a neighbor. This should include the point itself. 

The implementation however has a reference check to the point itself and does not add it to its neighbors list.

private List<T> getNeighbors(final T point, final Collection<T> points) {
        final List<T> neighbors = new ArrayList<T>();
        for (final T neighbor : points) {
            if (point != neighbor && distance(neighbor, point) <= eps) {
                neighbors.add(neighbor);
            }
        }
        return neighbors;
    } 


point != neighbor check should be removed here. The cluster should contain the point itself in it. Keeping this check effectively is raising the minPts count by 1. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)