You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemds.apache.org by GitBox <gi...@apache.org> on 2022/11/08 11:52:51 UTC

[GitHub] [systemds] Baunsgaard commented on pull request #1714: [SYSTEMDS-3254] Add new aliases for countDistinct built-in function

Baunsgaard commented on PR #1714:
URL: https://github.com/apache/systemds/pull/1714#issuecomment-1307087006

   > > Hi @BACtaki !
   > > Thanks for the addition,
   > > When i made the task i was trying to make the methods the same as in R, since much of the syntax is similar.
   > > https://www.geeksforgeeks.org/unique-function-in-r/
   > > Unfortunately it seems i was a bit quick in my decision, since unique != count distinct.
   > > and what really is missing is the unique() function to return all unique elements in a matrix.
   > > Sorry for the inconvenience.
   > > Best regards
   > > Sebastian.
   > 
   > Thanks for clearing that up @Baunsgaard !
   > 
   > > [..] what really is missing is the unique() function to return all unique elements in a matrix.
   > 
   > I see a unique function in `builtins.java`:
   > 
   > ```
   > 	UNIQUE("unique", true),
   > ```
   > 
   > In fact, there is an existing test for this function - `BuiltinUniqueTest.java` - that performs an equivalence check for the following:
   > 
   >     * `unique.dml`
   > 
   > 
   > ```
   > X = read($1);
   > R = unique(X = X);
   > write(R, $2);
   > ```
   > 
   >     * `unique.R`
   > 
   > 
   > ```
   > args<-commandArgs(TRUE)
   > options(digits=22)
   > library("Matrix")
   > 
   > X = as.matrix(readMM(paste(args[1], "X.mtx", sep="")));
   > R = unique(X[order(X[,1]),]);
   > writeMM(as(R, "CsparseMatrix"), paste(args[2], "R", sep=""));
   > ```
   > 
   > Is this the function you had in mind for the [JIRA](https://issues.apache.org/jira/browse/SYSTEMDS-3254) or was it something else?
   
   Yes, currently the tests case should parse, but for the wrong reasons.
   I added it with the intention of comparing to R, but R does something different, that the test does not capture, since it returns the distinct elements from a unique call rather than the number of distinct elements.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org