You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemds.apache.org by GitBox <gi...@apache.org> on 2022/08/03 20:51:06 UTC

[GitHub] [systemds] BACtaki opened a new pull request, #1677: [SYSTEMDS-3413] Add support for row/col aggregation to countDistinct builtin function

BACtaki opened a new pull request, #1677:
URL: https://github.com/apache/systemds/pull/1677

   This patch converts countDistinct() from a non-parameterized builtin to a parameterized builtin function to allow for 1 new parameter: dir for direction. The value of dir can be r and c, denoting row-wise and column-wise aggregation respectively. This patch only implements CP and the SP case will throw a NotImplementedException()- the latter case will be addressed in a subsequent patch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [systemds] BACtaki commented on a diff in pull request #1677: [SYSTEMDS-3413] Add support for row/col aggregation to countDistinct builtin function

Posted by GitBox <gi...@apache.org>.
BACtaki commented on code in PR #1677:
URL: https://github.com/apache/systemds/pull/1677#discussion_r1002845067


##########
src/main/java/org/apache/sysds/runtime/instructions/SPInstructionParser.java:
##########
@@ -126,7 +126,9 @@ public class SPInstructionParser extends InstructionParser
 		String2SPInstructionType.put( "uac*"    , SPType.AggregateUnary);
 		String2SPInstructionType.put( "uatrace" , SPType.AggregateUnary);
 		String2SPInstructionType.put( "uaktrace", SPType.AggregateUnary);
-		String2SPInstructionType.put( "uacdap"  , SPType.AggregateUnary);
+		String2SPInstructionType.put( "uacd"    , SPType.AggregateUnary);
+		String2SPInstructionType.put( "uacdr"   , SPType.AggregateUnary);
+		String2SPInstructionType.put( "uacdc"   , SPType.AggregateUnary);

Review Comment:
   They are already there- they are just in a separate "section" for sketch operators:
   ```
                   [..]
   		String2SPInstructionType.put( "uacd"    , SPType.AggregateUnary);
   		String2SPInstructionType.put( "uacdr"   , SPType.AggregateUnary);
   		String2SPInstructionType.put( "uacdc"   , SPType.AggregateUnary);
   
   		// Aggregate unary sketch operators
   		String2SPInstructionType.put( "uacdap" , SPType.AggregateUnarySketch);
   		String2SPInstructionType.put( "uacdapr", SPType.AggregateUnarySketch);
   		String2SPInstructionType.put( "uacdapc", SPType.AggregateUnarySketch);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [systemds] BACtaki commented on a diff in pull request #1677: [SYSTEMDS-3413] Add support for row/col aggregation to countDistinct builtin function

Posted by GitBox <gi...@apache.org>.
BACtaki commented on code in PR #1677:
URL: https://github.com/apache/systemds/pull/1677#discussion_r1004038189


##########
src/main/java/org/apache/sysds/runtime/matrix/operators/AggregateUnaryOperator.java:
##########
@@ -35,7 +35,7 @@ public class AggregateUnaryOperator extends MultiThreadedOperator {
 	private static final long serialVersionUID = 6690553323120787735L;
 
 	public final AggregateOperator aggOp;
-	public final IndexFunction indexFn;
+	public IndexFunction indexFn;

Review Comment:
   Good point @Baunsgaard -  I refactored the `CountDistinctOperator` ctor to make the API more intuitive



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [systemds] Baunsgaard commented on a diff in pull request #1677: [SYSTEMDS-3413] Add support for row/col aggregation to countDistinct builtin function

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on code in PR #1677:
URL: https://github.com/apache/systemds/pull/1677#discussion_r1003068071


##########
src/main/java/org/apache/sysds/lops/PartialAggregate.java:
##########
@@ -342,11 +342,20 @@ else if( dir == Direction.Col )
 			}
 
 			case COUNT_DISTINCT: {
-				if(dir == Direction.RowCol )
-					return "uacd";
+				switch (dir) {
+					case RowCol: return "uacd";
+					case Row: return "uacdr";
+					case Col: return "uacdc";

Review Comment:
   Maybe throw an exception here in the default case and not use the break.
   This would make errors more informative.



##########
src/main/java/org/apache/sysds/lops/PartialAggregate.java:
##########
@@ -355,6 +364,12 @@ else if( dir == Direction.Col )
 				}
 				break;
 			}
+

Review Comment:
   Same just above here , throw exception instead of break.



##########
src/main/java/org/apache/sysds/runtime/instructions/InstructionUtils.java:
##########
@@ -420,7 +413,40 @@ else if ( opcode.equalsIgnoreCase("uacmin") ) {
 			AggregateOperator agg = new AggregateOperator(Double.POSITIVE_INFINITY, Builtin.getBuiltinFnObject("min"));
 			aggun = new AggregateUnaryOperator(agg, ReduceRow.getReduceRowFnObject(), numThreads);
 		}
-		
+		else if ( opcode.equalsIgnoreCase("uacd") ) {
+			aggun = new CountDistinctOperator(AggregateUnaryCPInstruction.AUType.COUNT_DISTINCT)
+					.setDirection(Types.Direction.RowCol)
+					.setIndexFunction(ReduceAll.getReduceAllFnObject());
+		}
+		else if ( opcode.equalsIgnoreCase("uacdr") ) {
+			aggun = new CountDistinctOperator(AggregateUnaryCPInstruction.AUType.COUNT_DISTINCT)
+					.setDirection(Types.Direction.Row)
+					.setIndexFunction(ReduceCol.getReduceColFnObject());
+		}
+		else if ( opcode.equalsIgnoreCase("uacdc") ) {
+			aggun = new CountDistinctOperator(AggregateUnaryCPInstruction.AUType.COUNT_DISTINCT)
+					.setDirection(Types.Direction.Col)
+					.setIndexFunction(ReduceRow.getReduceRowFnObject());
+		}
+		else if ( opcode.equalsIgnoreCase("uacdap") ) {
+			aggun = new CountDistinctOperator(AggregateUnaryCPInstruction.AUType.COUNT_DISTINCT_APPROX)
+					.setDirection(Types.Direction.RowCol)
+					.setIndexFunction(ReduceAll.getReduceAllFnObject());
+		}
+		else if ( opcode.equalsIgnoreCase("uacdapr") ) {
+			aggun = new CountDistinctOperator(AggregateUnaryCPInstruction.AUType.COUNT_DISTINCT_APPROX)
+					.setDirection(Types.Direction.Row)
+					.setIndexFunction(ReduceCol.getReduceColFnObject());
+		}
+		else if ( opcode.equalsIgnoreCase("uacdapc") ) {
+			aggun = new CountDistinctOperator(AggregateUnaryCPInstruction.AUType.COUNT_DISTINCT_APPROX)
+					.setDirection(Types.Direction.Col)
+					.setIndexFunction(ReduceRow.getReduceRowFnObject());
+		}
+		else {
+			throw new IllegalArgumentException("Cannot parse given opcode");

Review Comment:
   I think this is what makes the tests fail. since some code rely on us returning the aggun as null i believe.



##########
src/main/java/org/apache/sysds/runtime/matrix/operators/AggregateUnaryOperator.java:
##########
@@ -35,7 +35,7 @@ public class AggregateUnaryOperator extends MultiThreadedOperator {
 	private static final long serialVersionUID = 6690553323120787735L;
 
 	public final AggregateOperator aggOp;
-	public final IndexFunction indexFn;
+	public IndexFunction indexFn;

Review Comment:
   if we can please keep it final.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [systemds] Baunsgaard commented on pull request #1677: [SYSTEMDS-3413] Add support for row/col aggregation to countDistinct builtin function

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on PR #1677:
URL: https://github.com/apache/systemds/pull/1677#issuecomment-1294952795

   Thanks for the PR, while merging i changed a wildcard to individual imports.
   we should avoid wildcard imports.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [systemds] BACtaki commented on pull request #1677: [SYSTEMDS-3413] Add support for row/col aggregation to countDistinct builtin function

Posted by GitBox <gi...@apache.org>.
BACtaki commented on PR #1677:
URL: https://github.com/apache/systemds/pull/1677#issuecomment-1204471039

   @phaniarnab @Baunsgaard @j143, would love your feedback! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [systemds] BACtaki commented on a diff in pull request #1677: [SYSTEMDS-3413] Add support for row/col aggregation to countDistinct builtin function

Posted by GitBox <gi...@apache.org>.
BACtaki commented on code in PR #1677:
URL: https://github.com/apache/systemds/pull/1677#discussion_r1002844611


##########
src/main/java/org/apache/sysds/runtime/instructions/InstructionUtils.java:
##########
@@ -420,7 +421,11 @@ else if ( opcode.equalsIgnoreCase("uacmin") ) {
 			AggregateOperator agg = new AggregateOperator(Double.POSITIVE_INFINITY, Builtin.getBuiltinFnObject("min"));
 			aggun = new AggregateUnaryOperator(agg, ReduceRow.getReduceRowFnObject(), numThreads);
 		}
-		
+		else if ( opcode.equalsIgnoreCase("uacd") || opcode.equalsIgnoreCase("uacdr")
+				|| opcode.equalsIgnoreCase("uacdc") ) {

Review Comment:
   The way it is written currently, `countDistinct()` and `countDistinctApprox()` are constructed directly in `AggregateUnaryCPInstruction` directly, i.e. outside `InstructionUtils.parseBasicAggregateUnaryOperator()`:
   
   ```
   	public static AggregateUnaryCPInstruction parseInstruction(String str) {
                   [..]
   		if(opcode.equalsIgnoreCase("nrow") || opcode.equalsIgnoreCase("ncol") 
   			|| opcode.equalsIgnoreCase("length") || opcode.equalsIgnoreCase("exists")
   			|| opcode.equalsIgnoreCase("lineage")){
                       [..]
   		} 
   		else if(opcode.equalsIgnoreCase("uacd")){
   			CountDistinctOperator op = new CountDistinctOperator(AUType.COUNT_DISTINCT)
   					.setDirection(Types.Direction.RowCol)
   					.setIndexFunction(ReduceAll.getReduceAllFnObject());
   
   			return new AggregateUnaryCPInstruction(op, in1, out, AUType.COUNT_DISTINCT, opcode, str);
   		}
   		else if(opcode.equalsIgnoreCase("uacdr")){
                       [..]
   		}
   		else if(opcode.equalsIgnoreCase("uacdc")){
                       [..]
   		}
   		else if(opcode.equalsIgnoreCase("uacdap")){
                       [..]
   		}
   		else if(opcode.equalsIgnoreCase("uacdapr")){
                       [..]
   		}
   		else if(opcode.equalsIgnoreCase("uacdapc")){
                       [..]
   		}
   		else if(opcode.equalsIgnoreCase("uarimax") || opcode.equalsIgnoreCase("uarimin")){
                       [..]
   		}
   		else { //DEFAULT BEHAVIOR
   			AggregateUnaryOperator aggun = InstructionUtils
   				.parseBasicAggregateUnaryOperator(opcode, Integer.parseInt(parts[3]));
   			return new AggregateUnaryCPInstruction(aggun, in1, out, AUType.DEFAULT, opcode, str);
   		}
   	}
   ```
   
   This is a result of `CountDistinctOperator` not inheriting from `AggregateUnaryOperator`. I have made this change in the next iteration. The switch logic has therefore been moved to `InstructionUtils.parseBasicAggregateUnaryOperator()`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [systemds] Baunsgaard commented on a diff in pull request #1677: [SYSTEMDS-3413] Add support for row/col aggregation to countDistinct builtin function

Posted by GitBox <gi...@apache.org>.
Baunsgaard commented on code in PR #1677:
URL: https://github.com/apache/systemds/pull/1677#discussion_r953862878


##########
src/main/java/org/apache/sysds/runtime/instructions/SPInstructionParser.java:
##########
@@ -126,7 +126,9 @@ public class SPInstructionParser extends InstructionParser
 		String2SPInstructionType.put( "uac*"    , SPType.AggregateUnary);
 		String2SPInstructionType.put( "uatrace" , SPType.AggregateUnary);
 		String2SPInstructionType.put( "uaktrace", SPType.AggregateUnary);
-		String2SPInstructionType.put( "uacdap"  , SPType.AggregateUnary);
+		String2SPInstructionType.put( "uacd"    , SPType.AggregateUnary);
+		String2SPInstructionType.put( "uacdr"   , SPType.AggregateUnary);
+		String2SPInstructionType.put( "uacdc"   , SPType.AggregateUnary);

Review Comment:
   same here, should we add the approximate methods as well?



##########
src/test/scripts/functions/countDistinct/countDistinctRowCol.dml:
##########
@@ -0,0 +1,24 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+input = round(rand(rows = $2, cols = $3, min = 0, max = $1 -1, sparsity= $4,  seed = 7))
+res = countDistinct(input)  # default is dir="rc"

Review Comment:
   add another test for this, the call with "rc" should be equivalent to no "rc" argument.
   While it is trivial and known, it is an easy test to add and it will enforce it later on.



##########
src/main/java/org/apache/sysds/runtime/instructions/InstructionUtils.java:
##########
@@ -420,7 +421,11 @@ else if ( opcode.equalsIgnoreCase("uacmin") ) {
 			AggregateOperator agg = new AggregateOperator(Double.POSITIVE_INFINITY, Builtin.getBuiltinFnObject("min"));
 			aggun = new AggregateUnaryOperator(agg, ReduceRow.getReduceRowFnObject(), numThreads);
 		}
-		
+		else if ( opcode.equalsIgnoreCase("uacd") || opcode.equalsIgnoreCase("uacdr")
+				|| opcode.equalsIgnoreCase("uacdc") ) {

Review Comment:
   Should we add the approximate methods as well?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [systemds] BACtaki commented on pull request #1677: [SYSTEMDS-3413] Add support for row/col aggregation to countDistinct builtin function

Posted by GitBox <gi...@apache.org>.
BACtaki commented on PR #1677:
URL: https://github.com/apache/systemds/pull/1677#issuecomment-1209581154

   > I will look into this @BACtaki. Meanwhile, please check why the tests are failing.
   
   Thanks @phaniarnab ! Ack, let me look into the test failures.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [systemds] BACtaki commented on a diff in pull request #1677: [SYSTEMDS-3413] Add support for row/col aggregation to countDistinct builtin function

Posted by GitBox <gi...@apache.org>.
BACtaki commented on code in PR #1677:
URL: https://github.com/apache/systemds/pull/1677#discussion_r1002844611


##########
src/main/java/org/apache/sysds/runtime/instructions/InstructionUtils.java:
##########
@@ -420,7 +421,11 @@ else if ( opcode.equalsIgnoreCase("uacmin") ) {
 			AggregateOperator agg = new AggregateOperator(Double.POSITIVE_INFINITY, Builtin.getBuiltinFnObject("min"));
 			aggun = new AggregateUnaryOperator(agg, ReduceRow.getReduceRowFnObject(), numThreads);
 		}
-		
+		else if ( opcode.equalsIgnoreCase("uacd") || opcode.equalsIgnoreCase("uacdr")
+				|| opcode.equalsIgnoreCase("uacdc") ) {

Review Comment:
   I have added them in a separate "section" for sketch operators:
   ```
   		String2SPInstructionType.put( "uacd"    , SPType.AggregateUnary);
   		String2SPInstructionType.put( "uacdr"   , SPType.AggregateUnary);
   		String2SPInstructionType.put( "uacdc"   , SPType.AggregateUnary);
   
   		// Aggregate unary sketch operators
   		String2SPInstructionType.put( "uacdap" , SPType.AggregateUnarySketch);
   		String2SPInstructionType.put( "uacdapr", SPType.AggregateUnarySketch);
   		String2SPInstructionType.put( "uacdapc", SPType.AggregateUnarySketch);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [systemds] BACtaki commented on a diff in pull request #1677: [SYSTEMDS-3413] Add support for row/col aggregation to countDistinct builtin function

Posted by GitBox <gi...@apache.org>.
BACtaki commented on code in PR #1677:
URL: https://github.com/apache/systemds/pull/1677#discussion_r1002844611


##########
src/main/java/org/apache/sysds/runtime/instructions/InstructionUtils.java:
##########
@@ -420,7 +421,11 @@ else if ( opcode.equalsIgnoreCase("uacmin") ) {
 			AggregateOperator agg = new AggregateOperator(Double.POSITIVE_INFINITY, Builtin.getBuiltinFnObject("min"));
 			aggun = new AggregateUnaryOperator(agg, ReduceRow.getReduceRowFnObject(), numThreads);
 		}
-		
+		else if ( opcode.equalsIgnoreCase("uacd") || opcode.equalsIgnoreCase("uacdr")
+				|| opcode.equalsIgnoreCase("uacdc") ) {

Review Comment:
   The way it is written currently, `countDistinct()` and `countDistinctApprox()` instructions are constructed directly in `AggregateUnaryCPInstruction` directly, i.e. outside `InstructionUtils.parseBasicAggregateUnaryOperator()`:
   
   ```
   	public static AggregateUnaryCPInstruction parseInstruction(String str) {
                   [..]
   		if(opcode.equalsIgnoreCase("nrow") || opcode.equalsIgnoreCase("ncol") 
   			|| opcode.equalsIgnoreCase("length") || opcode.equalsIgnoreCase("exists")
   			|| opcode.equalsIgnoreCase("lineage")){
                       [..]
   		} 
   		else if(opcode.equalsIgnoreCase("uacd")){
   			CountDistinctOperator op = new CountDistinctOperator(AUType.COUNT_DISTINCT)
   					.setDirection(Types.Direction.RowCol)
   					.setIndexFunction(ReduceAll.getReduceAllFnObject());
   
   			return new AggregateUnaryCPInstruction(op, in1, out, AUType.COUNT_DISTINCT, opcode, str);
   		}
   		else if(opcode.equalsIgnoreCase("uacdr")){
                       [..]
   		}
   		else if(opcode.equalsIgnoreCase("uacdc")){
                       [..]
   		}
   		else if(opcode.equalsIgnoreCase("uacdap")){
                       [..]
   		}
   		else if(opcode.equalsIgnoreCase("uacdapr")){
                       [..]
   		}
   		else if(opcode.equalsIgnoreCase("uacdapc")){
                       [..]
   		}
   		else if(opcode.equalsIgnoreCase("uarimax") || opcode.equalsIgnoreCase("uarimin")){
                       [..]
   		}
   		else { //DEFAULT BEHAVIOR
   			AggregateUnaryOperator aggun = InstructionUtils
   				.parseBasicAggregateUnaryOperator(opcode, Integer.parseInt(parts[3]));
   			return new AggregateUnaryCPInstruction(aggun, in1, out, AUType.DEFAULT, opcode, str);
   		}
   	}
   ```
   
   This is a result of `CountDistinctOperator` not inheriting from `AggregateUnaryOperator`. I have made this change in the next iteration. The switch logic has therefore been moved to `InstructionUtils.parseBasicAggregateUnaryOperator()`.



##########
src/main/java/org/apache/sysds/runtime/instructions/InstructionUtils.java:
##########
@@ -420,7 +421,11 @@ else if ( opcode.equalsIgnoreCase("uacmin") ) {
 			AggregateOperator agg = new AggregateOperator(Double.POSITIVE_INFINITY, Builtin.getBuiltinFnObject("min"));
 			aggun = new AggregateUnaryOperator(agg, ReduceRow.getReduceRowFnObject(), numThreads);
 		}
-		
+		else if ( opcode.equalsIgnoreCase("uacd") || opcode.equalsIgnoreCase("uacdr")
+				|| opcode.equalsIgnoreCase("uacdc") ) {

Review Comment:
   The way it is written currently, `countDistinct()` and `countDistinctApprox()` instructions are constructed directly in `AggregateUnaryCPInstruction`, i.e. outside `InstructionUtils.parseBasicAggregateUnaryOperator()`:
   
   ```
   	public static AggregateUnaryCPInstruction parseInstruction(String str) {
                   [..]
   		if(opcode.equalsIgnoreCase("nrow") || opcode.equalsIgnoreCase("ncol") 
   			|| opcode.equalsIgnoreCase("length") || opcode.equalsIgnoreCase("exists")
   			|| opcode.equalsIgnoreCase("lineage")){
                       [..]
   		} 
   		else if(opcode.equalsIgnoreCase("uacd")){
   			CountDistinctOperator op = new CountDistinctOperator(AUType.COUNT_DISTINCT)
   					.setDirection(Types.Direction.RowCol)
   					.setIndexFunction(ReduceAll.getReduceAllFnObject());
   
   			return new AggregateUnaryCPInstruction(op, in1, out, AUType.COUNT_DISTINCT, opcode, str);
   		}
   		else if(opcode.equalsIgnoreCase("uacdr")){
                       [..]
   		}
   		else if(opcode.equalsIgnoreCase("uacdc")){
                       [..]
   		}
   		else if(opcode.equalsIgnoreCase("uacdap")){
                       [..]
   		}
   		else if(opcode.equalsIgnoreCase("uacdapr")){
                       [..]
   		}
   		else if(opcode.equalsIgnoreCase("uacdapc")){
                       [..]
   		}
   		else if(opcode.equalsIgnoreCase("uarimax") || opcode.equalsIgnoreCase("uarimin")){
                       [..]
   		}
   		else { //DEFAULT BEHAVIOR
   			AggregateUnaryOperator aggun = InstructionUtils
   				.parseBasicAggregateUnaryOperator(opcode, Integer.parseInt(parts[3]));
   			return new AggregateUnaryCPInstruction(aggun, in1, out, AUType.DEFAULT, opcode, str);
   		}
   	}
   ```
   
   This is a result of `CountDistinctOperator` not inheriting from `AggregateUnaryOperator`. I have made this change in the next iteration. The switch logic has therefore been moved to `InstructionUtils.parseBasicAggregateUnaryOperator()`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [systemds] BACtaki commented on a diff in pull request #1677: [SYSTEMDS-3413] Add support for row/col aggregation to countDistinct builtin function

Posted by GitBox <gi...@apache.org>.
BACtaki commented on code in PR #1677:
URL: https://github.com/apache/systemds/pull/1677#discussion_r1002845067


##########
src/main/java/org/apache/sysds/runtime/instructions/SPInstructionParser.java:
##########
@@ -126,7 +126,9 @@ public class SPInstructionParser extends InstructionParser
 		String2SPInstructionType.put( "uac*"    , SPType.AggregateUnary);
 		String2SPInstructionType.put( "uatrace" , SPType.AggregateUnary);
 		String2SPInstructionType.put( "uaktrace", SPType.AggregateUnary);
-		String2SPInstructionType.put( "uacdap"  , SPType.AggregateUnary);
+		String2SPInstructionType.put( "uacd"    , SPType.AggregateUnary);
+		String2SPInstructionType.put( "uacdr"   , SPType.AggregateUnary);
+		String2SPInstructionType.put( "uacdc"   , SPType.AggregateUnary);

Review Comment:
   They are already there- they are just in a separate "section" for sketch operators:
   ```
   		String2SPInstructionType.put( "uacd"    , SPType.AggregateUnary);
   		String2SPInstructionType.put( "uacdr"   , SPType.AggregateUnary);
   		String2SPInstructionType.put( "uacdc"   , SPType.AggregateUnary);
   
   		// Aggregate unary sketch operators
   		String2SPInstructionType.put( "uacdap" , SPType.AggregateUnarySketch);
   		String2SPInstructionType.put( "uacdapr", SPType.AggregateUnarySketch);
   		String2SPInstructionType.put( "uacdapc", SPType.AggregateUnarySketch);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [systemds] BACtaki commented on a diff in pull request #1677: [SYSTEMDS-3413] Add support for row/col aggregation to countDistinct builtin function

Posted by GitBox <gi...@apache.org>.
BACtaki commented on code in PR #1677:
URL: https://github.com/apache/systemds/pull/1677#discussion_r1002845067


##########
src/main/java/org/apache/sysds/runtime/instructions/SPInstructionParser.java:
##########
@@ -126,7 +126,9 @@ public class SPInstructionParser extends InstructionParser
 		String2SPInstructionType.put( "uac*"    , SPType.AggregateUnary);
 		String2SPInstructionType.put( "uatrace" , SPType.AggregateUnary);
 		String2SPInstructionType.put( "uaktrace", SPType.AggregateUnary);
-		String2SPInstructionType.put( "uacdap"  , SPType.AggregateUnary);
+		String2SPInstructionType.put( "uacd"    , SPType.AggregateUnary);
+		String2SPInstructionType.put( "uacdr"   , SPType.AggregateUnary);
+		String2SPInstructionType.put( "uacdc"   , SPType.AggregateUnary);

Review Comment:
   They are already there- they are in a separate "section" for sketch operators:
   ```
   		String2SPInstructionType.put( "uacd"    , SPType.AggregateUnary);
   		String2SPInstructionType.put( "uacdr"   , SPType.AggregateUnary);
   		String2SPInstructionType.put( "uacdc"   , SPType.AggregateUnary);
   
   		// Aggregate unary sketch operators
   		String2SPInstructionType.put( "uacdap" , SPType.AggregateUnarySketch);
   		String2SPInstructionType.put( "uacdapr", SPType.AggregateUnarySketch);
   		String2SPInstructionType.put( "uacdapc", SPType.AggregateUnarySketch);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [systemds] BACtaki commented on a diff in pull request #1677: [SYSTEMDS-3413] Add support for row/col aggregation to countDistinct builtin function

Posted by GitBox <gi...@apache.org>.
BACtaki commented on code in PR #1677:
URL: https://github.com/apache/systemds/pull/1677#discussion_r1004038189


##########
src/main/java/org/apache/sysds/runtime/matrix/operators/AggregateUnaryOperator.java:
##########
@@ -35,7 +35,7 @@ public class AggregateUnaryOperator extends MultiThreadedOperator {
 	private static final long serialVersionUID = 6690553323120787735L;
 
 	public final AggregateOperator aggOp;
-	public final IndexFunction indexFn;
+	public IndexFunction indexFn;

Review Comment:
   👍  I refactored the `CountDistinctOperator` ctor to make the API more intuitive



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [systemds] phaniarnab commented on pull request #1677: [SYSTEMDS-3413] Add support for row/col aggregation to countDistinct builtin function

Posted by GitBox <gi...@apache.org>.
phaniarnab commented on PR #1677:
URL: https://github.com/apache/systemds/pull/1677#issuecomment-1207993196

   I will look into this @BACtaki. 
   Meanwhile, please check why the tests are failing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [systemds] Baunsgaard closed pull request #1677: [SYSTEMDS-3413] Add support for row/col aggregation to countDistinct builtin function

Posted by GitBox <gi...@apache.org>.
Baunsgaard closed pull request #1677: [SYSTEMDS-3413] Add support for row/col aggregation to countDistinct builtin function
URL: https://github.com/apache/systemds/pull/1677


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [systemds] BACtaki commented on pull request #1677: [SYSTEMDS-3413] Add support for row/col aggregation to countDistinct builtin function

Posted by GitBox <gi...@apache.org>.
BACtaki commented on PR #1677:
URL: https://github.com/apache/systemds/pull/1677#issuecomment-1288337567

   > I like the PR, i do not see in the code why the tests would fail so i will force them to restart,
   The following careless edit was the culprit:
   ```
   @@ -2361,10 +2375,6 @@ public class DMLTranslator
    		case SUM:
    		case PROD:
    		case VAR:
   -		case COUNT_DISTINCT:
   -			currBuiltinOp = new AggUnaryOp(target.getName(), DataType.SCALAR, target.getValueType(),
   -					AggOp.valueOf(source.getOpCode().name()), Direction.RowCol, expr);
   -			break;
   ```
   
   > is the method supposed to be:
   > countDistinctCol(X) or countDistinct(X, "c")
   > Personally i would prefer if we had both, maybe @mboehm7 have an opinion?
   
   That is a great idea! I have added `countDistinctRow` and `countDistinctCol` aliases for `countDistinct(X, dir="r")` and `countDistinct(X, dir="c")` respectively.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [systemds] BACtaki commented on a diff in pull request #1677: [SYSTEMDS-3413] Add support for row/col aggregation to countDistinct builtin function

Posted by GitBox <gi...@apache.org>.
BACtaki commented on code in PR #1677:
URL: https://github.com/apache/systemds/pull/1677#discussion_r1002844611


##########
src/main/java/org/apache/sysds/runtime/instructions/InstructionUtils.java:
##########
@@ -420,7 +421,11 @@ else if ( opcode.equalsIgnoreCase("uacmin") ) {
 			AggregateOperator agg = new AggregateOperator(Double.POSITIVE_INFINITY, Builtin.getBuiltinFnObject("min"));
 			aggun = new AggregateUnaryOperator(agg, ReduceRow.getReduceRowFnObject(), numThreads);
 		}
-		
+		else if ( opcode.equalsIgnoreCase("uacd") || opcode.equalsIgnoreCase("uacdr")
+				|| opcode.equalsIgnoreCase("uacdc") ) {

Review Comment:
   The way it is written currently, `countDistinct()` and `countDistinctApprox()` are constructed directly in `AggregateUnaryCPInstruction` directly, i.e. outside `InstructionUtils.parseBasicAggregateUnaryOperator()`:
   
   ```
   	public static AggregateUnaryCPInstruction parseInstruction(String str) {
                   [..]
   		if(opcode.equalsIgnoreCase("nrow") || opcode.equalsIgnoreCase("ncol") 
   			|| opcode.equalsIgnoreCase("length") || opcode.equalsIgnoreCase("exists")
   			|| opcode.equalsIgnoreCase("lineage")){
                       [..]
   		} 
   		else if(opcode.equalsIgnoreCase("uacd")){
   			CountDistinctOperator op = new CountDistinctOperator(AUType.COUNT_DISTINCT)
   					.setDirection(Types.Direction.RowCol)
   					.setIndexFunction(ReduceAll.getReduceAllFnObject());
   
   			return new AggregateUnaryCPInstruction(op, in1, out, AUType.COUNT_DISTINCT, opcode, str);
   		}
   		else if(opcode.equalsIgnoreCase("uacdr")){
                       [..]
   		}
   		else if(opcode.equalsIgnoreCase("uacdc")){
                       [..]
   		}
   		else if(opcode.equalsIgnoreCase("uacdap")){
                       [..]
   		}
   		else if(opcode.equalsIgnoreCase("uacdapr")){
                       [..]
   		}
   		else if(opcode.equalsIgnoreCase("uacdapc")){
                       [..]
   		}
   		else if(opcode.equalsIgnoreCase("uarimax") || opcode.equalsIgnoreCase("uarimin")){
                       [..]
   		}
   		else { //DEFAULT BEHAVIOR
   			AggregateUnaryOperator aggun = InstructionUtils
   				.parseBasicAggregateUnaryOperator(opcode, Integer.parseInt(parts[3]));
   			return new AggregateUnaryCPInstruction(aggun, in1, out, AUType.DEFAULT, opcode, str);
   		}
   	}
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org