You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by GitBox <gi...@apache.org> on 2020/05/05 18:42:41 UTC

[GitHub] [madlib] fmcquillan99 edited a comment on pull request #496: DBSCAN: Add new module DBSCAN

fmcquillan99 edited a comment on pull request #496:
URL: https://github.com/apache/madlib/pull/496#issuecomment-624228952


   (5)
   To be consistent with knn, can we please make the algorithm `brute_force` as the base.  It seems you can do `brute` but `brute_force` does not work:
   ```
   SELECT madlib.dbscan( 
   				'dbscan_train_data',	-- source table
   				'dbscan_result',		-- output table
   				'pid',					-- point id column
   				'pointsxx',				-- data point
   				 1.75,					-- epsilon
   				 4,						-- min samples
   				 'dist_norm2',	-- metric
   				'brute_force');			-- algorithm
   
   ERROR:  plpy.Error: dbscan Error: algorithm has to be one of the following: brute, kd-tree (plpython.c:5038)
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "dbscan", line 21, in <module>
       return dbscan.dbscan(**globals())
     PL/Python function "dbscan", line 38, in dbscan
     PL/Python function "dbscan", line 184, in _validate_dbscan
     PL/Python function "dbscan", line 123, in _assert
   PL/Python function "dbscan"
   ```
   
   
   (6) 
   It seems the column header `points` does not work, maybe in conflicts with an internal name?
   I think it could be a common col name so we should fix this.
   ```
   SELECT madlib.dbscan( 
   				'dbscan_train_data',	-- source table
   				'dbscan_result',		-- output table
   				'pid',					-- point id column
   				'points',				-- data point
   				 1.75,					-- epsilon
   				 4,						-- min samples
   				 'dist_norm2',	-- metric
   				'brute');			-- algorithm
   
   ERROR:  plpy.SPIError: column reference "points" is ambiguous
   LINE 3:             SET points = points
                                    ^
   QUERY:  
               UPDATE dbscan_result AS t1
               SET points = points
               FROM dbscan_train_data AS t2
               WHERE t1.pid = t2.pid
           
   CONTEXT:  Traceback (most recent call last):
     PL/Python function "dbscan", line 21, in <module>
       return dbscan.dbscan(**globals())
     PL/Python function "dbscan", line 109, in dbscan
   PL/Python function "dbscan"
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org