You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemds.apache.org by GitBox <gi...@apache.org> on 2022/07/29 07:25:03 UTC

[GitHub] [systemds] phaniarnab commented on a diff in pull request #1650: [SYSTEMDS-3390] Replace slicing for countDistinctApprox() row/col aggregation with direct ops

phaniarnab commented on code in PR #1650:
URL: https://github.com/apache/systemds/pull/1650#discussion_r932944938


##########
src/main/java/org/apache/sysds/runtime/matrix/data/LibMatrixCountDistinct.java:
##########
@@ -102,66 +114,255 @@ else if(in.getNonZeros() < minimumSize) {
 	 * 
 	 * Benefit: precise, but uses memory, on the scale of inputs number of distinct values.
 	 * 
-	 * @param in The input matrix to count number distinct values in
-	 * @return The absolute distinct count
+	 * @param blkIn The input matrix to count number distinct values in
+	 * @return A matrix block containing the absolute distinct count for the entire input or along given row/col axis
 	 */
-	private static int countDistinctValuesNaive(MatrixBlock in) {
+	private static MatrixBlock countDistinctValuesNaive(MatrixBlock blkIn, CountDistinctOperator op) {
+
+		if (blkIn.isEmpty()) {
+			return new MatrixBlock(1);
+		}
+		else if(blkIn instanceof CompressedMatrixBlock) {
+			throw new NotImplementedException("countDistinct() does not support CompressedMatrixBlock");
+		}
+
 		Set<Double> distinct = new HashSet<>();
+		MatrixBlock blkOut;
 		double[] data;
-		if(in.isEmpty())
-			return 1;
-		else if(in instanceof CompressedMatrixBlock)
-			throw new NotImplementedException();
 
-		long nonZeros = in.getNonZeros();
+		if (op.getDirection().isRowCol()) {
+			blkOut = new MatrixBlock(1, 1, false);
 
-		if(nonZeros != -1 && nonZeros < in.getNumColumns() * in.getNumRows()) {
-			distinct.add(0d);
-		}
+			long distinctCount = 0;
+			long nonZeros = blkIn.getNonZeros();
 
-		if(in.sparseBlock != null) {
-			SparseBlock sb = in.sparseBlock;
+			// Check if input matrix contains any 0 values for RowCol case.
+			// This does not apply to row/col case, where we count nnz per row or col during iteration.
+			if(nonZeros != -1 && nonZeros < blkIn.getNumColumns() * blkIn.getNumRows()) {
+				distinct.add(0d);
+			}
 
-			if(in.sparseBlock.isContiguous()) {
-				data = sb.values(0);
-				countDistinctValuesNaive(data, distinct);
+			if(blkIn.getSparseBlock() != null) {
+				SparseBlock sb = blkIn.getSparseBlock();
+				if(blkIn.getSparseBlock().isContiguous()) {
+					// COO, CSR
+					data = sb.values(0);
+					distinctCount = countDistinctValuesNaive(data, distinct);
+				} else {
+					// MCSR
+					for(int i = 0; i < blkIn.getNumRows(); i++) {
+						if(!sb.isEmpty(i)) {
+							data = blkIn.getSparseBlock().values(i);
+							distinctCount = countDistinctValuesNaive(data, distinct);
+						}
+					}
+				}
+			} else if(blkIn.getDenseBlock() != null) {
+				DenseBlock db = blkIn.getDenseBlock();
+				for (int i = 0; i <= db.numBlocks(); i++) {
+					data = db.valuesAt(i);
+					distinctCount = countDistinctValuesNaive(data, distinct);
+				}
 			}
-			else {
-				for(int i = 0; i < in.getNumRows(); i++) {
-					if(!sb.isEmpty(i)) {
-						data = in.sparseBlock.values(i);
+
+			blkOut.setValue(0, 0, distinctCount);
+		} else if (op.getDirection().isRow()) {
+			blkOut = blkIn.slice(0, blkIn.getNumRows() - 1, 0, 0);

Review Comment:
   Why deep copying (slice) the first column is required? You are anyway going to set individual values. Won't just allocating a dense column vector work? Something like below.
   `blkOut = new MatrixBlock(blkIn.getNumRows(), 1, false, blkIn.getNumRows());
    blkOut.allocateBlock();`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org