You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by "sduffey-partnerize (via GitHub)" <gi...@apache.org> on 2023/03/13 12:17:58 UTC

[GitHub] [druid] sduffey-partnerize opened a new pull request, #13927: Introduce SQL interface for distinct count extension

sduffey-partnerize opened a new pull request, #13927:
URL: https://github.com/apache/druid/pull/13927

### Description

Introduce a SQL interface for the distinctcount extension, via a new function `SEGMENT_DISTINCT`.

Added `calcite` and `druid-sql` as dependencies of distinctcount, then introduced `SegmentDistinctSqlAggregator`, an implementation of calcite's `SqlAggregator`

Need some direction on documentation. For example, would we want to see the SQL equivalents of the examples that already exist [here](https://github.com/apache/druid/blob/master/docs/development/extensions-contrib/distinctcount.md)? Anything else?

#### Release note
New: You can now use distinct count in a SQL query with SEGMENT_DISTINCT
<!-- Give your best effort to summarize your changes in a couple of sentences aimed toward Druid users.

If your change doesn't have end user impact, you can skip this section.

For tips about how to write a good release note, see [Release notes](https://github.com/apache/druid/blob/master/CONTRIBUTING.md#release-notes).

-->

<hr>

##### Key changed/added classes in this PR
* `org.apache.druid.query.aggregation.distinctcount.sql.SegmentDistinctSqlAggregator `

<hr>

This PR has:

- [x] been self-reviewed.
- [ ] added documentation for new or modified features or behaviors.
- [x] a release note entry in the PR description.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
- [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
- [ ] added integration tests.
- [ ] been tested in a test Druid cluster.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

Re: [PR] Introduce SQL interface for distinct count extension (druid)

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] closed pull request #13927: Introduce SQL interface for distinct count extension
URL: https://github.com/apache/druid/pull/13927


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] LakshSingla commented on pull request #13927: Introduce SQL interface for distinct count extension

Posted by "LakshSingla (via GitHub)" <gi...@apache.org>.

LakshSingla commented on PR #13927:
URL: https://github.com/apache/druid/pull/13927#issuecomment-1607033199

   Hi, @sduffey-partnerize! Did you make progress on the PR?
   Feel free to reach out to me in case you have any doubts regarding the comments! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] gianm commented on a diff in pull request #13927: Introduce SQL interface for distinct count extension

Posted by "gianm (via GitHub)" <gi...@apache.org>.

gianm commented on code in PR #13927:
URL: https://github.com/apache/druid/pull/13927#discussion_r1134659979


##########
extensions-contrib/distinctcount/src/main/java/org/apache/druid/query/aggregation/distinctcount/DistinctCountAggregatorFactory.java:
##########
@@ -116,15 +116,6 @@ public int compare(Object o, Object o1)
   @Override
   public Object combine(Object lhs, Object rhs)
   {
-    if (lhs == null && rhs == null) {
-      return 0L;
-    }
-    if (rhs == null) {
-      return ((Number) lhs).longValue();
-    }
-    if (lhs == null) {
-      return ((Number) rhs).longValue();
-    }
     return ((Number) lhs).longValue() + ((Number) rhs).longValue();

Review Comment:
   This change makes `combine` no longer work on nulls; was that not needed for some reason?



##########
extensions-contrib/distinctcount/src/main/java/org/apache/druid/query/aggregation/distinctcount/sql/SegmentDistinctSqlAggregator.java:
##########
@@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.distinctcount.sql;
+
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.InferTypes;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.sql.type.ReturnTypes;
+import org.apache.calcite.sql.type.SqlTypeFamily;
+import org.apache.calcite.sql.type.SqlTypeName;
+import org.apache.calcite.util.Optionality;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.distinctcount.DistinctCountAggregatorFactory;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.dimension.DimensionSpec;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.sql.calcite.aggregation.Aggregation;
+import org.apache.druid.sql.calcite.aggregation.SqlAggregator;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.Expressions;
+import org.apache.druid.sql.calcite.planner.Calcites;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+import org.apache.druid.sql.calcite.rel.VirtualColumnRegistry;
+
+import javax.annotation.Nullable;
+import java.util.List;
+
+
+
+public class SegmentDistinctSqlAggregator implements SqlAggregator
+{
+  private static final SqlAggFunction FUNCTION_INSTANCE = new SegmentDistinctAggFunction();
+  private static final String NAME = "SEGMENT_DISTINCT";
+
+  @Override
+  public SqlAggFunction calciteFunction()
+  {
+    return FUNCTION_INSTANCE;
+  }
+
+  @Nullable
+  @Override
+  public Aggregation toDruidAggregation(
+          PlannerContext plannerContext,
+          RowSignature rowSignature,
+          VirtualColumnRegistry virtualColumnRegistry,
+          RexBuilder rexBuilder,
+          String name,
+          AggregateCall aggregateCall,
+          Project project,
+          List<Aggregation> list,
+          boolean finalizeAggregations)
+  {
+
+    // Don't use Aggregations.getArgumentsForSimpleAggregator, since it won't let us use direct column access
+    // for string columns.
+    final RexNode columnRexNode = Expressions.fromFieldAccess(
+        rowSignature,
+        project,
+        aggregateCall.getArgList().get(0)
+    );
+
+    final DruidExpression columnArg = Expressions.toDruidExpression(plannerContext, rowSignature, columnRexNode);
+    if (columnArg == null) {
+      return null;
+    }
+
+    final AggregatorFactory aggregatorFactory;
+    final String aggregatorName = finalizeAggregations ? Calcites.makePrefixedName(name, "a") : name;
+
+    if (columnArg.isDirectColumnAccess()
+        && rowSignature.getColumnType(columnArg.getDirectColumn()).map(type -> type.is(ValueType.COMPLEX)).orElse(false)) {
+      aggregatorFactory = new DistinctCountAggregatorFactory(name, columnArg.getDirectColumn(), null);
+    } else {
+      final RelDataType dataType = columnRexNode.getType();
+      final ColumnType inputType = Calcites.getColumnTypeForRelDataType(dataType);
+
+      if (inputType == null) {
+        throw new ISE(
+            "Cannot translate sqlTypeName[%s] to Druid type for field[%s]",
+            dataType.getSqlTypeName(),
+            aggregatorName
+        );
+      }
+
+      final DimensionSpec dimensionSpec;
+
+      if (columnArg.isDirectColumnAccess()) {
+        dimensionSpec = columnArg.getSimpleExtraction().toDimensionSpec(null, inputType);
+      } else {
+        String virtualColumnName = virtualColumnRegistry.getOrCreateVirtualColumnForExpression(
+            columnArg,
+            dataType
+        );
+        dimensionSpec = new DefaultDimensionSpec(virtualColumnName, null, inputType);
+      }
+
+      aggregatorFactory = new DistinctCountAggregatorFactory(name, dimensionSpec.getDimension(), null);
+    }
+
+    return Aggregation.create(aggregatorFactory);
+  }
+
+  private static class SegmentDistinctAggFunction extends SqlAggFunction
+  {
+    private static final String SIGNATURE = "'" + NAME + "(column, bitMapFactory)'\n";
+
+    SegmentDistinctAggFunction()
+    {
+      super(
+          NAME,
+          null,
+          SqlKind.OTHER_FUNCTION,
+          ReturnTypes.explicit(SqlTypeName.BIGINT),
+          InferTypes.VARCHAR_1024,
+          OperandTypes.or(
+              OperandTypes.ANY,
+              OperandTypes.and(
+                  OperandTypes.sequence(SIGNATURE, OperandTypes.ANY, OperandTypes.LITERAL),
+                  OperandTypes.family(SqlTypeFamily.ANY, SqlTypeFamily.STRING)

Review Comment:
   I don't see the LITERAL STRING argument being used in the function body. Is that intentional?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] github-code-scanning[bot] commented on a diff in pull request #13927: Introduce SQL interface for distinct count extension

Posted by "github-code-scanning[bot] (via GitHub)" <gi...@apache.org>.

github-code-scanning[bot] commented on code in PR #13927:
URL: https://github.com/apache/druid/pull/13927#discussion_r1139282636


##########
extensions-contrib/distinctcount/src/main/java/org/apache/druid/query/aggregation/distinctcount/sql/SegmentDistinctSqlAggregator.java:
##########
@@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.distinctcount.sql;
+
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.InferTypes;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.sql.type.ReturnTypes;
+import org.apache.calcite.sql.type.SqlTypeFamily;
+import org.apache.calcite.sql.type.SqlTypeName;
+import org.apache.calcite.util.Optionality;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.distinctcount.DistinctCountAggregatorFactory;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.dimension.DimensionSpec;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.sql.calcite.aggregation.Aggregation;
+import org.apache.druid.sql.calcite.aggregation.SqlAggregator;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.Expressions;
+import org.apache.druid.sql.calcite.planner.Calcites;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+import org.apache.druid.sql.calcite.rel.VirtualColumnRegistry;
+
+import javax.annotation.Nullable;
+import java.util.List;
+
+
+
+public class SegmentDistinctSqlAggregator implements SqlAggregator
+{
+  private static final SqlAggFunction FUNCTION_INSTANCE = new SegmentDistinctAggFunction();
+  private static final String NAME = "SEGMENT_DISTINCT";
+
+  @Override
+  public SqlAggFunction calciteFunction()
+  {
+    return FUNCTION_INSTANCE;
+  }
+
+  @Nullable
+  @Override
+  public Aggregation toDruidAggregation(
+          PlannerContext plannerContext,
+          RowSignature rowSignature,
+          VirtualColumnRegistry virtualColumnRegistry,
+          RexBuilder rexBuilder,
+          String name,
+          AggregateCall aggregateCall,
+          Project project,
+          List<Aggregation> list,
+          boolean finalizeAggregations)
+  {
+
+    // Don't use Aggregations.getArgumentsForSimpleAggregator, since it won't let us use direct column access
+    // for string columns.
+    final RexNode columnRexNode = Expressions.fromFieldAccess(
+        rowSignature,
+        project,
+        aggregateCall.getArgList().get(0)
+    );

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [Expressions.fromFieldAccess](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4400)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] sduffey-partnerize commented on a diff in pull request #13927: Introduce SQL interface for distinct count extension

Posted by "sduffey-partnerize (via GitHub)" <gi...@apache.org>.

sduffey-partnerize commented on code in PR #13927:
URL: https://github.com/apache/druid/pull/13927#discussion_r1146211284


##########
extensions-contrib/distinctcount/src/main/java/org/apache/druid/query/aggregation/distinctcount/sql/SegmentDistinctSqlAggregator.java:
##########
@@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.distinctcount.sql;
+
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.InferTypes;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.sql.type.ReturnTypes;
+import org.apache.calcite.sql.type.SqlTypeFamily;
+import org.apache.calcite.sql.type.SqlTypeName;
+import org.apache.calcite.util.Optionality;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.distinctcount.DistinctCountAggregatorFactory;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.dimension.DimensionSpec;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.sql.calcite.aggregation.Aggregation;
+import org.apache.druid.sql.calcite.aggregation.SqlAggregator;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.Expressions;
+import org.apache.druid.sql.calcite.planner.Calcites;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+import org.apache.druid.sql.calcite.rel.VirtualColumnRegistry;
+
+import javax.annotation.Nullable;
+import java.util.List;
+
+
+
+public class SegmentDistinctSqlAggregator implements SqlAggregator
+{
+  private static final SqlAggFunction FUNCTION_INSTANCE = new SegmentDistinctAggFunction();
+  private static final String NAME = "SEGMENT_DISTINCT";
+
+  @Override
+  public SqlAggFunction calciteFunction()
+  {
+    return FUNCTION_INSTANCE;
+  }
+
+  @Nullable
+  @Override
+  public Aggregation toDruidAggregation(
+          PlannerContext plannerContext,
+          RowSignature rowSignature,
+          VirtualColumnRegistry virtualColumnRegistry,
+          RexBuilder rexBuilder,
+          String name,
+          AggregateCall aggregateCall,
+          Project project,
+          List<Aggregation> list,
+          boolean finalizeAggregations)
+  {
+
+    // Don't use Aggregations.getArgumentsForSimpleAggregator, since it won't let us use direct column access
+    // for string columns.
+    final RexNode columnRexNode = Expressions.fromFieldAccess(
+        rowSignature,
+        project,
+        aggregateCall.getArgList().get(0)
+    );
+
+    final DruidExpression columnArg = Expressions.toDruidExpression(plannerContext, rowSignature, columnRexNode);
+    if (columnArg == null) {
+      return null;
+    }
+
+    final AggregatorFactory aggregatorFactory;
+    final String aggregatorName = finalizeAggregations ? Calcites.makePrefixedName(name, "a") : name;
+
+    if (columnArg.isDirectColumnAccess()
+        && rowSignature.getColumnType(columnArg.getDirectColumn()).map(type -> type.is(ValueType.COMPLEX)).orElse(false)) {
+      aggregatorFactory = new DistinctCountAggregatorFactory(name, columnArg.getDirectColumn(), null);
+    } else {
+      final RelDataType dataType = columnRexNode.getType();
+      final ColumnType inputType = Calcites.getColumnTypeForRelDataType(dataType);
+
+      if (inputType == null) {
+        throw new ISE(
+            "Cannot translate sqlTypeName[%s] to Druid type for field[%s]",
+            dataType.getSqlTypeName(),
+            aggregatorName
+        );
+      }
+
+      final DimensionSpec dimensionSpec;
+
+      if (columnArg.isDirectColumnAccess()) {
+        dimensionSpec = columnArg.getSimpleExtraction().toDimensionSpec(null, inputType);
+      } else {
+        String virtualColumnName = virtualColumnRegistry.getOrCreateVirtualColumnForExpression(
+            columnArg,
+            dataType
+        );
+        dimensionSpec = new DefaultDimensionSpec(virtualColumnName, null, inputType);
+      }
+
+      aggregatorFactory = new DistinctCountAggregatorFactory(name, dimensionSpec.getDimension(), null);
+    }
+
+    return Aggregation.create(aggregatorFactory);
+  }
+
+  private static class SegmentDistinctAggFunction extends SqlAggFunction
+  {
+    private static final String SIGNATURE = "'" + NAME + "(column, bitMapFactory)'\n";
+
+    SegmentDistinctAggFunction()
+    {
+      super(
+          NAME,
+          null,
+          SqlKind.OTHER_FUNCTION,
+          ReturnTypes.explicit(SqlTypeName.BIGINT),
+          InferTypes.VARCHAR_1024,
+          OperandTypes.or(
+              OperandTypes.ANY,
+              OperandTypes.and(
+                  OperandTypes.sequence(SIGNATURE, OperandTypes.ANY, OperandTypes.LITERAL),
+                  OperandTypes.family(SqlTypeFamily.ANY, SqlTypeFamily.STRING)

Review Comment:
   We had a look back at some other classes that extend `SqlAggFunction`, particularly `ApproxCountDistinctSqlAggFunction`, and noticed that doesn't take the bitmap factory argument. So we decided to simplify SEGMENT_DISTINCT in the same way. Is that OK?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] sduffey-partnerize commented on a diff in pull request #13927: Introduce SQL interface for distinct count extension

Posted by "sduffey-partnerize (via GitHub)" <gi...@apache.org>.

sduffey-partnerize commented on code in PR #13927:
URL: https://github.com/apache/druid/pull/13927#discussion_r1146182216


##########
extensions-contrib/distinctcount/src/main/java/org/apache/druid/query/aggregation/distinctcount/sql/SegmentDistinctSqlAggregator.java:
##########
@@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.distinctcount.sql;
+
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.InferTypes;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.sql.type.ReturnTypes;
+import org.apache.calcite.sql.type.SqlTypeFamily;
+import org.apache.calcite.sql.type.SqlTypeName;
+import org.apache.calcite.util.Optionality;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.distinctcount.DistinctCountAggregatorFactory;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.dimension.DimensionSpec;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.sql.calcite.aggregation.Aggregation;
+import org.apache.druid.sql.calcite.aggregation.SqlAggregator;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.Expressions;
+import org.apache.druid.sql.calcite.planner.Calcites;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+import org.apache.druid.sql.calcite.rel.VirtualColumnRegistry;
+
+import javax.annotation.Nullable;
+import java.util.List;
+
+
+
+public class SegmentDistinctSqlAggregator implements SqlAggregator
+{
+  private static final SqlAggFunction FUNCTION_INSTANCE = new SegmentDistinctAggFunction();
+  private static final String NAME = "SEGMENT_DISTINCT";
+
+  @Override
+  public SqlAggFunction calciteFunction()
+  {
+    return FUNCTION_INSTANCE;
+  }
+
+  @Nullable
+  @Override
+  public Aggregation toDruidAggregation(
+          PlannerContext plannerContext,
+          RowSignature rowSignature,
+          VirtualColumnRegistry virtualColumnRegistry,
+          RexBuilder rexBuilder,
+          String name,
+          AggregateCall aggregateCall,
+          Project project,
+          List<Aggregation> list,
+          boolean finalizeAggregations)
+  {
+
+    // Don't use Aggregations.getArgumentsForSimpleAggregator, since it won't let us use direct column access
+    // for string columns.
+    final RexNode columnRexNode = Expressions.fromFieldAccess(
+        rowSignature,
+        project,
+        aggregateCall.getArgList().get(0)
+    );

Review Comment:
   Switched to the new 4 argument version of the method



##########
extensions-contrib/distinctcount/src/main/java/org/apache/druid/query/aggregation/distinctcount/DistinctCountAggregatorFactory.java:
##########
@@ -116,15 +116,6 @@ public int compare(Object o, Object o1)
   @Override
   public Object combine(Object lhs, Object rhs)
   {
-    if (lhs == null && rhs == null) {
-      return 0L;
-    }
-    if (rhs == null) {
-      return ((Number) lhs).longValue();
-    }
-    if (lhs == null) {
-      return ((Number) rhs).longValue();
-    }
     return ((Number) lhs).longValue() + ((Number) rhs).longValue();

Review Comment:
   Reverted



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] abhishekagarwal87 commented on a diff in pull request #13927: Introduce SQL interface for distinct count extension

Posted by "abhishekagarwal87 (via GitHub)" <gi...@apache.org>.

abhishekagarwal87 commented on code in PR #13927:
URL: https://github.com/apache/druid/pull/13927#discussion_r1177630130


##########
extensions-contrib/distinctcount/src/main/java/org/apache/druid/query/aggregation/distinctcount/sql/SegmentDistinctSqlAggregator.java:
##########
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.distinctcount.sql;
+
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.InferTypes;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.sql.type.ReturnTypes;
+import org.apache.calcite.sql.type.SqlTypeName;
+import org.apache.calcite.util.Optionality;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.distinctcount.DistinctCountAggregatorFactory;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.dimension.DimensionSpec;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.sql.calcite.aggregation.Aggregation;
+import org.apache.druid.sql.calcite.aggregation.SqlAggregator;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.Expressions;
+import org.apache.druid.sql.calcite.planner.Calcites;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+import org.apache.druid.sql.calcite.rel.VirtualColumnRegistry;
+
+import javax.annotation.Nullable;
+import java.util.List;
+
+
+
+public class SegmentDistinctSqlAggregator implements SqlAggregator
+{
+  private static final SqlAggFunction FUNCTION_INSTANCE = new SegmentDistinctAggFunction();
+  private static final String NAME = "SEGMENT_DISTINCT";
+
+  @Override
+  public SqlAggFunction calciteFunction()
+  {
+    return FUNCTION_INSTANCE;
+  }
+
+  @Nullable
+  @Override
+  public Aggregation toDruidAggregation(
+          PlannerContext plannerContext,
+          RowSignature rowSignature,
+          VirtualColumnRegistry virtualColumnRegistry,
+          RexBuilder rexBuilder,
+          String name,
+          AggregateCall aggregateCall,
+          Project project,
+          List<Aggregation> list,
+          boolean finalizeAggregations)
+  {
+
+    // Don't use Aggregations.getArgumentsForSimpleAggregator, since it won't let us use direct column access
+    // for string columns.
+    final RexNode columnRexNode = Expressions.fromFieldAccess(
+        rexBuilder.getTypeFactory(),
+        rowSignature,
+        project,
+        aggregateCall.getArgList().get(0)
+    );
+
+    final DruidExpression columnArg = Expressions.toDruidExpression(plannerContext, rowSignature, columnRexNode);
+    if (columnArg == null) {
+      return null;
+    }
+
+    final AggregatorFactory aggregatorFactory;
+    final String aggregatorName = finalizeAggregations ? Calcites.makePrefixedName(name, "a") : name;
+
+    if (columnArg.isDirectColumnAccess()
+        && rowSignature.getColumnType(columnArg.getDirectColumn()).map(type -> type.is(ValueType.COMPLEX)).orElse(false)) {
+      aggregatorFactory = new DistinctCountAggregatorFactory(name, columnArg.getDirectColumn(), null);
+    } else {
+      final RelDataType dataType = columnRexNode.getType();
+      final ColumnType inputType = Calcites.getColumnTypeForRelDataType(dataType);
+
+      if (inputType == null) {
+        throw new ISE(

Review Comment:
   You should use `org.apache.druid.sql.calcite.planner.UnsupportedSQLQueryException` instead of ISE. Please refer to the class documentation why the former is preferred. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

Re: [PR] Introduce SQL interface for distinct count extension (druid)

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on PR #13927:
URL: https://github.com/apache/druid/pull/13927#issuecomment-1984824497

   This pull request/issue has been closed due to lack of activity. If you think that
   is incorrect, or the pull request requires review, you can revive the PR at any time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] LakshSingla commented on a diff in pull request #13927: Introduce SQL interface for distinct count extension

Posted by "LakshSingla (via GitHub)" <gi...@apache.org>.

LakshSingla commented on code in PR #13927:
URL: https://github.com/apache/druid/pull/13927#discussion_r1182063161


##########
extensions-contrib/distinctcount/src/main/java/org/apache/druid/query/aggregation/distinctcount/DistinctCountAggregator.java:
##########
@@ -45,6 +45,7 @@ public void aggregate()
     IndexedInts row = selector.getRow();
     for (int i = 0, rowSize = row.size(); i < rowSize; i++) {
       int index = row.get(i);
+

Review Comment:
   nit: We can revert this change



##########
extensions-contrib/distinctcount/src/main/java/org/apache/druid/query/aggregation/distinctcount/sql/SegmentDistinctSqlAggregator.java:
##########
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.distinctcount.sql;
+
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.InferTypes;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.sql.type.ReturnTypes;
+import org.apache.calcite.sql.type.SqlTypeName;
+import org.apache.calcite.util.Optionality;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.distinctcount.DistinctCountAggregatorFactory;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.dimension.DimensionSpec;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.sql.calcite.aggregation.Aggregation;
+import org.apache.druid.sql.calcite.aggregation.SqlAggregator;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.Expressions;
+import org.apache.druid.sql.calcite.planner.Calcites;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+import org.apache.druid.sql.calcite.rel.VirtualColumnRegistry;
+
+import javax.annotation.Nullable;
+import java.util.List;
+
+
+
+public class SegmentDistinctSqlAggregator implements SqlAggregator
+{
+  private static final SqlAggFunction FUNCTION_INSTANCE = new SegmentDistinctAggFunction();
+  private static final String NAME = "SEGMENT_DISTINCT";
+
+  @Override
+  public SqlAggFunction calciteFunction()
+  {
+    return FUNCTION_INSTANCE;
+  }
+
+  @Nullable
+  @Override
+  public Aggregation toDruidAggregation(
+          PlannerContext plannerContext,
+          RowSignature rowSignature,
+          VirtualColumnRegistry virtualColumnRegistry,
+          RexBuilder rexBuilder,
+          String name,
+          AggregateCall aggregateCall,
+          Project project,
+          List<Aggregation> list,
+          boolean finalizeAggregations)
+  {
+
+    // Don't use Aggregations.getArgumentsForSimpleAggregator, since it won't let us use direct column access
+    // for string columns.
+    final RexNode columnRexNode = Expressions.fromFieldAccess(
+        rexBuilder.getTypeFactory(),
+        rowSignature,
+        project,
+        aggregateCall.getArgList().get(0)
+    );
+
+    final DruidExpression columnArg = Expressions.toDruidExpression(plannerContext, rowSignature, columnRexNode);
+    if (columnArg == null) {
+      return null;
+    }
+
+    final AggregatorFactory aggregatorFactory;
+    final String aggregatorName = finalizeAggregations ? Calcites.makePrefixedName(name, "a") : name;
+
+    if (columnArg.isDirectColumnAccess()
+        && rowSignature.getColumnType(columnArg.getDirectColumn()).map(type -> type.is(ValueType.COMPLEX)).orElse(false)) {
+      aggregatorFactory = new DistinctCountAggregatorFactory(name, columnArg.getDirectColumn(), null);
+    } else {
+      final RelDataType dataType = columnRexNode.getType();
+      final ColumnType inputType = Calcites.getColumnTypeForRelDataType(dataType);
+
+      if (inputType == null) {
+        throw new ISE(

Review Comment:
   Also, can you please explain why this inputType check is required? If we don't create the dimensionSpec below (as mentioned in another comment of mine), we probably won't run into an error with inputType being null in this code.
   Would nullity of inputType cause any issue in the aggregation, and if so can you please update with a comment?



##########
extensions-contrib/distinctcount/src/main/java/org/apache/druid/query/aggregation/distinctcount/sql/SegmentDistinctSqlAggregator.java:
##########
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.distinctcount.sql;
+
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.InferTypes;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.sql.type.ReturnTypes;
+import org.apache.calcite.sql.type.SqlTypeName;
+import org.apache.calcite.util.Optionality;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.distinctcount.DistinctCountAggregatorFactory;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.dimension.DimensionSpec;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.sql.calcite.aggregation.Aggregation;
+import org.apache.druid.sql.calcite.aggregation.SqlAggregator;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.Expressions;
+import org.apache.druid.sql.calcite.planner.Calcites;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+import org.apache.druid.sql.calcite.rel.VirtualColumnRegistry;
+
+import javax.annotation.Nullable;
+import java.util.List;
+
+
+
+public class SegmentDistinctSqlAggregator implements SqlAggregator
+{
+  private static final SqlAggFunction FUNCTION_INSTANCE = new SegmentDistinctAggFunction();
+  private static final String NAME = "SEGMENT_DISTINCT";
+
+  @Override
+  public SqlAggFunction calciteFunction()
+  {
+    return FUNCTION_INSTANCE;
+  }
+
+  @Nullable
+  @Override
+  public Aggregation toDruidAggregation(
+          PlannerContext plannerContext,
+          RowSignature rowSignature,
+          VirtualColumnRegistry virtualColumnRegistry,
+          RexBuilder rexBuilder,
+          String name,
+          AggregateCall aggregateCall,
+          Project project,
+          List<Aggregation> list,
+          boolean finalizeAggregations)
+  {
+
+    // Don't use Aggregations.getArgumentsForSimpleAggregator, since it won't let us use direct column access
+    // for string columns.
+    final RexNode columnRexNode = Expressions.fromFieldAccess(
+        rexBuilder.getTypeFactory(),
+        rowSignature,
+        project,
+        aggregateCall.getArgList().get(0)
+    );
+
+    final DruidExpression columnArg = Expressions.toDruidExpression(plannerContext, rowSignature, columnRexNode);
+    if (columnArg == null) {
+      return null;
+    }
+
+    final AggregatorFactory aggregatorFactory;
+    final String aggregatorName = finalizeAggregations ? Calcites.makePrefixedName(name, "a") : name;
+
+    if (columnArg.isDirectColumnAccess()
+        && rowSignature.getColumnType(columnArg.getDirectColumn()).map(type -> type.is(ValueType.COMPLEX)).orElse(false)) {
+      aggregatorFactory = new DistinctCountAggregatorFactory(name, columnArg.getDirectColumn(), null);
+    } else {
+      final RelDataType dataType = columnRexNode.getType();
+      final ColumnType inputType = Calcites.getColumnTypeForRelDataType(dataType);
+
+      if (inputType == null) {
+        throw new ISE(
+            "Cannot translate sqlTypeName[%s] to Druid type for field[%s]",
+            dataType.getSqlTypeName(),
+            aggregatorName
+        );
+      }
+
+      final DimensionSpec dimensionSpec;
+
+      if (columnArg.isDirectColumnAccess()) {
+        dimensionSpec = columnArg.getSimpleExtraction().toDimensionSpec(null, inputType);
+      } else {
+        String virtualColumnName = virtualColumnRegistry.getOrCreateVirtualColumnForExpression(
+            columnArg,
+            dataType
+        );
+        dimensionSpec = new DefaultDimensionSpec(virtualColumnName, null, inputType);
+      }
+
+      aggregatorFactory = new DistinctCountAggregatorFactory(name, dimensionSpec.getDimension(), null);

Review Comment:
   Seems slightly counter-intuitive that we are creating a dimension spec in the above cases just to get `dimensionSpec.getDimension()` while creating the final aggregator. 
   
   Instead of Line#116, can we do `dimensionName = columnArg.getSimpleExtraction().getColumn` (since its a direct column access0, and in Line#122 we do `dimensionName = virtualColumnName` and pass that to the aggregator factory. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] sduffey-partnerize commented on a diff in pull request #13927: Introduce SQL interface for distinct count extension

Posted by "sduffey-partnerize (via GitHub)" <gi...@apache.org>.

sduffey-partnerize commented on code in PR #13927:
URL: https://github.com/apache/druid/pull/13927#discussion_r1186863442


##########
extensions-contrib/distinctcount/src/main/java/org/apache/druid/query/aggregation/distinctcount/sql/SegmentDistinctSqlAggregator.java:
##########
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.distinctcount.sql;
+
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexBuilder;
+import org.apache.calcite.rex.RexNode;
+import org.apache.calcite.sql.SqlAggFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.type.InferTypes;
+import org.apache.calcite.sql.type.OperandTypes;
+import org.apache.calcite.sql.type.ReturnTypes;
+import org.apache.calcite.sql.type.SqlTypeName;
+import org.apache.calcite.util.Optionality;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.query.aggregation.AggregatorFactory;
+import org.apache.druid.query.aggregation.distinctcount.DistinctCountAggregatorFactory;
+import org.apache.druid.query.dimension.DefaultDimensionSpec;
+import org.apache.druid.query.dimension.DimensionSpec;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.RowSignature;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.sql.calcite.aggregation.Aggregation;
+import org.apache.druid.sql.calcite.aggregation.SqlAggregator;
+import org.apache.druid.sql.calcite.expression.DruidExpression;
+import org.apache.druid.sql.calcite.expression.Expressions;
+import org.apache.druid.sql.calcite.planner.Calcites;
+import org.apache.druid.sql.calcite.planner.PlannerContext;
+import org.apache.druid.sql.calcite.rel.VirtualColumnRegistry;
+
+import javax.annotation.Nullable;
+import java.util.List;
+
+
+
+public class SegmentDistinctSqlAggregator implements SqlAggregator
+{
+  private static final SqlAggFunction FUNCTION_INSTANCE = new SegmentDistinctAggFunction();
+  private static final String NAME = "SEGMENT_DISTINCT";
+
+  @Override
+  public SqlAggFunction calciteFunction()
+  {
+    return FUNCTION_INSTANCE;
+  }
+
+  @Nullable
+  @Override
+  public Aggregation toDruidAggregation(
+          PlannerContext plannerContext,
+          RowSignature rowSignature,
+          VirtualColumnRegistry virtualColumnRegistry,
+          RexBuilder rexBuilder,
+          String name,
+          AggregateCall aggregateCall,
+          Project project,
+          List<Aggregation> list,
+          boolean finalizeAggregations)
+  {
+
+    // Don't use Aggregations.getArgumentsForSimpleAggregator, since it won't let us use direct column access
+    // for string columns.
+    final RexNode columnRexNode = Expressions.fromFieldAccess(
+        rexBuilder.getTypeFactory(),
+        rowSignature,
+        project,
+        aggregateCall.getArgList().get(0)
+    );
+
+    final DruidExpression columnArg = Expressions.toDruidExpression(plannerContext, rowSignature, columnRexNode);
+    if (columnArg == null) {
+      return null;
+    }
+
+    final AggregatorFactory aggregatorFactory;
+    final String aggregatorName = finalizeAggregations ? Calcites.makePrefixedName(name, "a") : name;
+
+    if (columnArg.isDirectColumnAccess()
+        && rowSignature.getColumnType(columnArg.getDirectColumn()).map(type -> type.is(ValueType.COMPLEX)).orElse(false)) {
+      aggregatorFactory = new DistinctCountAggregatorFactory(name, columnArg.getDirectColumn(), null);
+    } else {
+      final RelDataType dataType = columnRexNode.getType();
+      final ColumnType inputType = Calcites.getColumnTypeForRelDataType(dataType);
+
+      if (inputType == null) {
+        throw new ISE(
+            "Cannot translate sqlTypeName[%s] to Druid type for field[%s]",
+            dataType.getSqlTypeName(),
+            aggregatorName
+        );
+      }
+
+      final DimensionSpec dimensionSpec;
+
+      if (columnArg.isDirectColumnAccess()) {
+        dimensionSpec = columnArg.getSimpleExtraction().toDimensionSpec(null, inputType);
+      } else {
+        String virtualColumnName = virtualColumnRegistry.getOrCreateVirtualColumnForExpression(
+            columnArg,
+            dataType
+        );
+        dimensionSpec = new DefaultDimensionSpec(virtualColumnName, null, inputType);
+      }
+
+      aggregatorFactory = new DistinctCountAggregatorFactory(name, dimensionSpec.getDimension(), null);

Review Comment:
   Thanks for the feedback, will try that out!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

Re: [PR] Introduce SQL interface for distinct count extension (druid)

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on PR #13927:
URL: https://github.com/apache/druid/pull/13927#issuecomment-1935131117

   This pull request has been marked as stale due to 60 days of inactivity.
   It will be closed in 4 weeks if no further activity occurs. If you think
   that's incorrect or this pull request should instead be reviewed, please simply
   write any comment. Even if closed, you can still revive the PR at any time or
   discuss it on the dev@druid.apache.org list.
   Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org