You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/02/05 23:24:24 UTC

[GitHub] [druid] maytasm3 opened a new pull request #9317: ANY Aggregator should not skip null values implementation

maytasm3 opened a new pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317
 
 
   <!-- Thanks for trying to help us make Apache Druid be the best it can be! Please fill out as much of the following information as is possible (where relevant, and remove it when irrelevant) to help make the intention and scope of this PR clear in order to ease review. -->
   
   Fixes #XXXX.
   
   <!-- Replace XXXX with the id of the issue fixed in this PR. Remove this section if there is no corresponding issue. Don't reference the issue in the title of this pull-request. -->
   
   <!-- If you are a committer, follow the PR action item checklist for committers:
   https://github.com/apache/druid/blob/master/dev/committer-instructions.md#pr-and-issue-action-item-checklist-for-committers. -->
   
   ### Description
   
   <!-- Describe the goal of this PR, what problem are you fixing. If there is a corresponding issue (referenced above), it's not necessary to repeat the description here, however, you may choose to keep one summary sentence. -->
   
   <!-- Describe your patch: what did you change in code? How did you fix the problem? -->
   
   <!-- If there are several relatively logically separate changes in this PR, create a mini-section for each of them. For example: -->
   
   #### Fixed the bug ...
   #### Renamed the class ...
   #### Added a forbidden-apis entry ...
   
   <!--
   In each section, please describe design decisions made, including:
    - Choice of algorithms
    - Behavioral aspects. What configuration values are acceptable? How are corner cases and error conditions handled, such as when there are insufficient resources?
    - Class organization and design (how the logic is split between classes, inheritance, composition, design patterns)
    - Method organization and design (how the logic is split between methods, parameters and return types)
    - Naming (class, method, API, configuration, HTTP endpoint, names of emitted metrics)
   -->
   
   
   <!-- It's good to describe an alternative design (or mention an alternative name) for every design (or naming) decision point and compare the alternatives with the designs that you've implemented (or the names you've chosen) to highlight the advantages of the chosen designs and names. -->
   
   <!-- If there was a discussion of the design of the feature implemented in this PR elsewhere (e. g. a "Proposal" issue, any other issue, or a thread in the development mailing list), link to that discussion from this PR description and explain what have changed in your final design compared to your original proposal or the consensus version in the end of the discussion. If something hasn't changed since the original discussion, you can omit a detailed discussion of those aspects of the design here, perhaps apart from brief mentioning for the sake of readability of this PR description. -->
   
   <!-- Some of the aspects mentioned above may be omitted for simple and small changes. -->
   
   <hr>
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not all of these items apply to every PR. Remove the items which are not done or not relevant to the PR. None of the items from the checklist above are strictly necessary, but it would be very helpful if you at least self-review the PR. -->
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `MyFoo`
    * `OurBar`
    * `TheirBaz`
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r377931492
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/query/aggregation/any/DoubleAnyAggregator.java
 ##########
 @@ -19,44 +19,35 @@
 
 package org.apache.druid.query.aggregation.any;
 
-import org.apache.druid.query.aggregation.Aggregator;
-import org.apache.druid.query.aggregation.NullableNumericAggregator;
-import org.apache.druid.query.aggregation.NullableNumericAggregatorFactory;
 import org.apache.druid.segment.BaseDoubleColumnValueSelector;
 
+import javax.annotation.Nullable;
+
 /**
- * This Aggregator is created by the {@link DoubleAnyAggregatorFactory} which extends from
- * {@link NullableNumericAggregatorFactory}. If null needs to be handle, then {@link NullableNumericAggregatorFactory}
- * will wrap this aggregator in {@link NullableNumericAggregator} and can handle all null in that class.
- * Hence, no null will ever be pass into this aggregator from the valueSelector.
+ * This Aggregator is created by the {@link DoubleAnyAggregatorFactory} which has no special null handling logic.
 
 Review comment:
   I'm not sure this is really worth mentioning. This is the standard case for aggregator implementations, since it is less common for an agg implementation to have magic null handling wrapped around it than just handling the nulls itself.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r377971577
 
 

 ##########
 File path: docs/querying/sql.md
 ##########
 @@ -205,7 +205,7 @@ Only the COUNT aggregation can accept DISTINCT.
 |`EARLIEST(expr, maxBytesPerString)`|Like `EARLIEST(expr)`, but for strings. The `maxBytesPerString` parameter determines how much aggregation space to allocate per string. Strings longer than this limit will be truncated. This parameter should be set as low as possible, since high values will lead to wasted memory.|
 |`LATEST(expr)`|Returns the latest value of `expr`, which must be numeric. If `expr` comes from a relation with a timestamp column (like a Druid datasource) then "latest" is the value last encountered with the maximum overall timestamp of all values being aggregated. If `expr` does not come from a relation with a timestamp, then it is simply the last value encountered.|
 |`LATEST(expr, maxBytesPerString)`|Like `LATEST(expr)`, but for strings. The `maxBytesPerString` parameter determines how much aggregation space to allocate per string. Strings longer than this limit will be truncated. This parameter should be set as low as possible, since high values will lead to wasted memory.|
-|`ANY_VALUE(expr)`|Returns any value of `expr`, which must be numeric. If `druid.generic.useDefaultValueForNull=true` this can return the default value for null and does not prefer "non-null" values over the default value for null. If `druid.generic.useDefaultValueForNull=false`, then this will return any non-null value of `expr`|
+|`ANY_VALUE(expr)`|Returns any value of `expr`, which must be numeric. If `druid.generic.useDefaultValueForNull=true` this can return the default value for null and does not prefer "non-null" values over the default value for null. If `druid.generic.useDefaultValueForNull=false`, then this will return any value of `expr` including null|
 
 Review comment:
   simplified as suggested. 
   expr needs to be null for ANY_VALUE(expr)
   String column have to be ANY_VALUE(expr, maxBytesPerString)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r377935040
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/query/aggregation/any/NumericAnyBufferAggregator.java
 ##########
 @@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.any;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.BufferAggregator;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.BaseNullableColumnValueSelector;
+
+import java.nio.ByteBuffer;
+
+/**
+ * Base type for buffer based 'any' aggregator for primitive numeric column selectors
+ */
+public abstract class NumericAnyBufferAggregator<TSelector extends BaseNullableColumnValueSelector>
+    implements BufferAggregator
+{
+  private static final byte BYTE_FLAG_IS_NOT_SET = 0;
 
 Review comment:
   Please use `NullHandling.IS_NULL_BYTE` and `NullHandling.IS_NOT_NULL_BYTE` to be consistent with other aggregators, at least for the 'is null' byte.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r378427106
 
 

 ##########
 File path: docs/querying/aggregations.md
 ##########
 @@ -238,7 +238,7 @@ Note that queries with first/last aggregators on a segment created with rollup e
 
 (Double/Float/Long/String) ANY aggregator cannot be used in ingestion spec, and should only be specified as part of queries.
 
-If `druid.generic.useDefaultValueForNull=true` aggregation can returns the default value for null and does not prefer "non-null" values over the default value for null. If `druid.generic.useDefaultValueForNull=false`, then aggregation will returns any value including null.
+Returns any value including null. This aggregator simplify and optimize the performance by returning the first encoutnered value (including null)
 
 Review comment:
   oops done.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r378003007
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/query/aggregation/any/NumericAnyBufferAggregator.java
 ##########
 @@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.any;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.BufferAggregator;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.BaseNullableColumnValueSelector;
+
+import java.nio.ByteBuffer;
+
+/**
+ * Base type for buffer based 'any' aggregator for primitive numeric column selectors
+ */
+public abstract class NumericAnyBufferAggregator<TSelector extends BaseNullableColumnValueSelector>
+    implements BufferAggregator
+{
+  private static final byte BYTE_FLAG_IS_NOT_SET = 0;
+  private static final byte BYTE_FLAG_IS_SET = 1;
+  private static final int IS_FOUND_FLAG_OFFSET_POSITION = 0;
+  private static final int IS_NULL_FLAG_OFFSET_POSITION = IS_FOUND_FLAG_OFFSET_POSITION + Byte.BYTES;
+  private static final int FOUND_VALUE_OFFSET_POSITION = IS_NULL_FLAG_OFFSET_POSITION + Byte.BYTES;
 
 Review comment:
   Done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r377773517
 
 

 ##########
 File path: docs/querying/aggregations.md
 ##########
 @@ -238,7 +238,7 @@ Note that queries with first/last aggregators on a segment created with rollup e
 
 (Double/Float/Long/String) ANY aggregator cannot be used in ingestion spec, and should only be specified as part of queries.
 
-If `druid.generic.useDefaultValueForNull=true` aggregation can returns the default value for null and does not prefer "non-null" values over the default value for null. If `druid.generic.useDefaultValueForNull=false`, then aggregation will returns any non-null value.
+If `druid.generic.useDefaultValueForNull=true` aggregation can returns the default value for null and does not prefer "non-null" values over the default value for null. If `druid.generic.useDefaultValueForNull=false`, then aggregation will returns any value including null.
 
 Review comment:
   I think we should re-write this to explain why someone would use this aggregator similar to how it's explained in the snowflake docs - https://docs.snowflake.net/manuals/sql-reference/functions/any_value.html
   I'm not sure where the correct place in the docs is to explain this - since technically this is the spec for the native query and we have another page with a spec for sql.
   
   Here's my suggestion for the why:
   
   ```
   ANY aggregator can be used to simplify and optimize the performance of GROUP BY statements. A common problem for many queries is that the result of a query with a GROUP BY clause can only contain expressions used in the GROUP BY clause itself, or results of aggregate functions
   
   select customer.id , customer.name , sum(orders.value)
       from customer
       join orders on customer.id = orders.customer_id
       group by customer.id , customer.name;
   
   Since we know that customer.id can have only one name, this can be optimized as
   
   select customer.id , ANY(customer.name) , sum(orders.value)
       from customer
       join orders on customer.id = orders.customer_id
       group by customer.id , customer.name;
   ```
   
   I should also point out, with the current implementation of aggregators, there is no advantage to using an ANY aggregator vs a min aggregator, but maybe that will change in the future 🤷‍♂ 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r378428398
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/query/aggregation/any/NumericAnyBufferAggregator.java
 ##########
 @@ -32,11 +32,11 @@
 public abstract class NumericAnyBufferAggregator<TSelector extends BaseNullableColumnValueSelector>
     implements BufferAggregator
 {
-  private static final byte BYTE_FLAG_IS_NOT_SET = 0;
-  private static final byte BYTE_FLAG_IS_SET = 1;
-  private static final int IS_FOUND_FLAG_OFFSET_POSITION = 0;
-  private static final int IS_NULL_FLAG_OFFSET_POSITION = IS_FOUND_FLAG_OFFSET_POSITION + Byte.BYTES;
-  private static final int FOUND_VALUE_OFFSET_POSITION = IS_NULL_FLAG_OFFSET_POSITION + Byte.BYTES;
+  // Rightmost bit for is null check (0 for is null and 1 for not null)
+  // Second rightmost bit for is found check (0 for not found and 1 for found)
+  private static final byte BYTE_FLAG_FOUND_MASK = 0b0010;
 
 Review comment:
   Done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r377944007
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/query/aggregation/any/NumericAnyBufferAggregator.java
 ##########
 @@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.any;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.BufferAggregator;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.BaseNullableColumnValueSelector;
+
+import java.nio.ByteBuffer;
+
+/**
+ * Base type for buffer based 'any' aggregator for primitive numeric column selectors
+ */
+public abstract class NumericAnyBufferAggregator<TSelector extends BaseNullableColumnValueSelector>
+    implements BufferAggregator
+{
+  private static final byte BYTE_FLAG_IS_NOT_SET = 0;
+  private static final byte BYTE_FLAG_IS_SET = 1;
+  private static final int IS_FOUND_FLAG_OFFSET_POSITION = 0;
 
 Review comment:
   Further, since is null is only going to be either 0 or 1, you save a byte per agg result and could use any of the high bits of this null byte to set whether or not you have found a value. Instead of an offset for null flag, you define a bit mask and your found check becomes something like `buf.get(position) & FOUND_MASK == FOUND_MASK` or whatever.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r378005102
 
 

 ##########
 File path: server/src/main/java/org/apache/druid/server/QueryLifecycle.java
 ##########
 @@ -318,14 +318,12 @@ public void emitLogsAndMetrics(
 
       if (e != null) {
         statsMap.put("exception", e.toString());
-
-        if (e instanceof QueryInterruptedException) {
-          // Mimic behavior from QueryResource, where this code was originally taken from.
-          log.noStackTrace().warn(e, "Exception while processing queryId [%s]", baseQuery.getId());
-          statsMap.put("interrupted", true);
-          statsMap.put("reason", e.toString());
-        }
+        // Mimic behavior from QueryResource, where this code was originally taken from.
 
 Review comment:
   The change is so that we can see the exception (and possibly stacktrace) for exceptions other than QueryInterruptedException too

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r377929896
 
 

 ##########
 File path: docs/querying/sql.md
 ##########
 @@ -205,7 +205,7 @@ Only the COUNT aggregation can accept DISTINCT.
 |`EARLIEST(expr, maxBytesPerString)`|Like `EARLIEST(expr)`, but for strings. The `maxBytesPerString` parameter determines how much aggregation space to allocate per string. Strings longer than this limit will be truncated. This parameter should be set as low as possible, since high values will lead to wasted memory.|
 |`LATEST(expr)`|Returns the latest value of `expr`, which must be numeric. If `expr` comes from a relation with a timestamp column (like a Druid datasource) then "latest" is the value last encountered with the maximum overall timestamp of all values being aggregated. If `expr` does not come from a relation with a timestamp, then it is simply the last value encountered.|
 |`LATEST(expr, maxBytesPerString)`|Like `LATEST(expr)`, but for strings. The `maxBytesPerString` parameter determines how much aggregation space to allocate per string. Strings longer than this limit will be truncated. This parameter should be set as low as possible, since high values will lead to wasted memory.|
-|`ANY_VALUE(expr)`|Returns any value of `expr`, which must be numeric. If `druid.generic.useDefaultValueForNull=true` this can return the default value for null and does not prefer "non-null" values over the default value for null. If `druid.generic.useDefaultValueForNull=false`, then this will return any non-null value of `expr`|
+|`ANY_VALUE(expr)`|Returns any value of `expr`, which must be numeric. If `druid.generic.useDefaultValueForNull=true` this can return the default value for null and does not prefer "non-null" values over the default value for null. If `druid.generic.useDefaultValueForNull=false`, then this will return any value of `expr` including null|
 
 Review comment:
   I sort of think this should just be simplified so it is less confusing, maybe something like:
   
   > Returns any value of `expr`, including null.
   
   Also, expr does not need to be numeric since you implemented a string any aggregator in the previous PR.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r378002862
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/query/aggregation/any/NumericAnyBufferAggregator.java
 ##########
 @@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.any;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.BufferAggregator;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.BaseNullableColumnValueSelector;
+
+import java.nio.ByteBuffer;
+
+/**
+ * Base type for buffer based 'any' aggregator for primitive numeric column selectors
+ */
+public abstract class NumericAnyBufferAggregator<TSelector extends BaseNullableColumnValueSelector>
+    implements BufferAggregator
+{
+  private static final byte BYTE_FLAG_IS_NOT_SET = 0;
 
 Review comment:
   Done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r376653240
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/query/aggregation/any/DoubleAnyAggregatorFactory.java
 ##########
 @@ -19,103 +19,214 @@
 
 package org.apache.druid.query.aggregation.any;
 
-import com.fasterxml.jackson.annotation.JacksonInject;
 import com.fasterxml.jackson.annotation.JsonCreator;
 import com.fasterxml.jackson.annotation.JsonProperty;
-import org.apache.druid.math.expr.ExprMacroTable;
+import com.google.common.base.Preconditions;
+import org.apache.druid.java.util.common.UOE;
+import org.apache.druid.query.aggregation.AggregateCombiner;
 import org.apache.druid.query.aggregation.Aggregator;
 import org.apache.druid.query.aggregation.AggregatorFactory;
 import org.apache.druid.query.aggregation.AggregatorUtil;
 import org.apache.druid.query.aggregation.BufferAggregator;
-import org.apache.druid.query.aggregation.SimpleDoubleAggregatorFactory;
+import org.apache.druid.query.aggregation.DoubleSumAggregator;
 import org.apache.druid.query.cache.CacheKeyBuilder;
 import org.apache.druid.segment.BaseDoubleColumnValueSelector;
+import org.apache.druid.segment.ColumnSelectorFactory;
+import org.apache.druid.segment.NilColumnValueSelector;
+import org.apache.druid.segment.column.ColumnHolder;
 
 import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
 import java.util.Collections;
+import java.util.Comparator;
 import java.util.List;
+import java.util.Objects;
 
-public class DoubleAnyAggregatorFactory extends SimpleDoubleAggregatorFactory
+public class DoubleAnyAggregatorFactory extends AggregatorFactory
 {
+  private static final Comparator<Double> VALUE_COMPARATOR = Comparator.nullsFirst(Double::compare);
+
+  private static final Aggregator NIL_AGGREGATOR = new DoubleAnyAggregator(
+      NilColumnValueSelector.instance()
+  )
+  {
+    @Override
+    public void aggregate()
+    {
+      // no-op
+    }
+  };
+
+  private static final BufferAggregator NIL_BUFFER_AGGREGATOR = new DoubleAnyBufferAggregator(
+      NilColumnValueSelector.instance()
+  )
+  {
+    @Override
+    public void aggregate(ByteBuffer buf, int position)
+    {
+      // no-op
+    }
+  };
+
+  private final String fieldName;
+  private final String name;
+  private final boolean storeDoubleAsFloat;
+
   @JsonCreator
   public DoubleAnyAggregatorFactory(
       @JsonProperty("name") String name,
-      @JsonProperty("fieldName") final String fieldName,
-      @JsonProperty("expression") @Nullable String expression,
-      @JacksonInject ExprMacroTable macroTable
+      @JsonProperty("fieldName") final String fieldName
   )
   {
-    super(macroTable, name, fieldName, expression);
-  }
+    Preconditions.checkNotNull(name, "Must have a valid, non-null aggregator name");
+    Preconditions.checkNotNull(fieldName, "Must have a valid, non-null fieldName");
 
-  public DoubleAnyAggregatorFactory(String name, String fieldName)
-  {
-    this(name, fieldName, null, ExprMacroTable.nil());
+    this.name = name;
+    this.fieldName = fieldName;
+    this.storeDoubleAsFloat = ColumnHolder.storeDoubleAsFloat();
   }
 
   @Override
-  protected double nullValue()
+  public Aggregator factorize(ColumnSelectorFactory metricFactory)
   {
-    return Double.NaN;
+    final BaseDoubleColumnValueSelector valueSelector = metricFactory.makeColumnValueSelector(fieldName);
+    if (valueSelector instanceof NilColumnValueSelector) {
+      return NIL_AGGREGATOR;
+    } else {
+      return new DoubleAnyAggregator(
+          valueSelector
+      );
+    }
   }
 
   @Override
-  protected Aggregator buildAggregator(BaseDoubleColumnValueSelector selector)
+  public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
   {
-    return new DoubleAnyAggregator(selector);
+    final BaseDoubleColumnValueSelector valueSelector = metricFactory.makeColumnValueSelector(fieldName);
+    if (valueSelector instanceof NilColumnValueSelector) {
+      return NIL_BUFFER_AGGREGATOR;
+    } else {
+      return new DoubleAnyBufferAggregator(
+          valueSelector
+      );
+    }
   }
 
   @Override
-  protected BufferAggregator buildBufferAggregator(BaseDoubleColumnValueSelector selector)
+  public Comparator getComparator()
   {
-    return new DoubleAnyBufferAggregator(selector);
+    return VALUE_COMPARATOR;
   }
 
   @Override
   @Nullable
   public Object combine(@Nullable Object lhs, @Nullable Object rhs)
   {
-    if (lhs != null) {
-      return lhs;
-    } else {
-      return rhs;
-    }
+    return lhs;
+  }
+
+  @Override
+  public AggregateCombiner makeAggregateCombiner()
+  {
+    throw new UOE("DoubleAnyAggregatorFactory is not supported during ingestion for rollup");
   }
 
   @Override
   public AggregatorFactory getCombiningFactory()
   {
-    return new DoubleAnyAggregatorFactory(name, name, null, macroTable);
+    return new DoubleAnyAggregatorFactory(name, name);
   }
 
   @Override
   public List<AggregatorFactory> getRequiredColumns()
   {
-    return Collections.singletonList(new DoubleAnyAggregatorFactory(fieldName, fieldName, expression, macroTable));
+    return Collections.singletonList(new DoubleAnyAggregatorFactory(fieldName, fieldName));
+  }
+
+  @Override
+  public Object deserialize(Object object)
+  {
+    // handle "NaN" / "Infinity" values serialized as strings in JSON
+    if (object instanceof String) {
+      return Double.parseDouble((String) object);
+    }
+    return object;
+  }
+
+  @Override
+  @Nullable
+  public Object finalizeComputation(@Nullable Object object)
+  {
+    return object;
+  }
+
+  @Override
+  @JsonProperty
+  public String getName()
+  {
+    return name;
+  }
+
+  @JsonProperty
+  public String getFieldName()
+  {
+    return fieldName;
+  }
+
+  @Override
+  public List<String> requiredFields()
+  {
+    return Collections.singletonList(fieldName);
   }
 
   @Override
   public byte[] getCacheKey()
   {
     return new CacheKeyBuilder(AggregatorUtil.DOUBLE_ANY_CACHE_TYPE_ID)
         .appendString(fieldName)
-        .appendString(expression)
         .build();
   }
 
+  @Override
+  public String getTypeName()
+  {
+    // if we don't pretend to be a primitive, group by v1 gets sad and doesn't work because no complex type serde
 
 Review comment:
   this comment seems not applicable since it is a primitive

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] sthetland commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
sthetland commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r377813959
 
 

 ##########
 File path: docs/querying/aggregations.md
 ##########
 @@ -238,7 +238,7 @@ Note that queries with first/last aggregators on a segment created with rollup e
 
 (Double/Float/Long/String) ANY aggregator cannot be used in ingestion spec, and should only be specified as part of queries.
 
-If `druid.generic.useDefaultValueForNull=true` aggregation can returns the default value for null and does not prefer "non-null" values over the default value for null. If `druid.generic.useDefaultValueForNull=false`, then aggregation will returns any non-null value.
+If `druid.generic.useDefaultValueForNull=true` aggregation can returns the default value for null and does not prefer "non-null" values over the default value for null. If `druid.generic.useDefaultValueForNull=false`, then aggregation will returns any value including null.
 
 Review comment:
   +1 to Suneet's rewording. Agreed as well that we should add the "why".

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r377996826
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/query/aggregation/any/DoubleAnyAggregatorFactory.java
 ##########
 @@ -19,103 +19,214 @@
 
 package org.apache.druid.query.aggregation.any;
 
-import com.fasterxml.jackson.annotation.JacksonInject;
 import com.fasterxml.jackson.annotation.JsonCreator;
 import com.fasterxml.jackson.annotation.JsonProperty;
-import org.apache.druid.math.expr.ExprMacroTable;
+import com.google.common.base.Preconditions;
+import org.apache.druid.java.util.common.UOE;
+import org.apache.druid.query.aggregation.AggregateCombiner;
 import org.apache.druid.query.aggregation.Aggregator;
 import org.apache.druid.query.aggregation.AggregatorFactory;
 import org.apache.druid.query.aggregation.AggregatorUtil;
 import org.apache.druid.query.aggregation.BufferAggregator;
-import org.apache.druid.query.aggregation.SimpleDoubleAggregatorFactory;
+import org.apache.druid.query.aggregation.DoubleSumAggregator;
 import org.apache.druid.query.cache.CacheKeyBuilder;
 import org.apache.druid.segment.BaseDoubleColumnValueSelector;
+import org.apache.druid.segment.ColumnSelectorFactory;
+import org.apache.druid.segment.NilColumnValueSelector;
+import org.apache.druid.segment.column.ColumnHolder;
 
 import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
 import java.util.Collections;
+import java.util.Comparator;
 import java.util.List;
+import java.util.Objects;
 
-public class DoubleAnyAggregatorFactory extends SimpleDoubleAggregatorFactory
+public class DoubleAnyAggregatorFactory extends AggregatorFactory
 {
+  private static final Comparator<Double> VALUE_COMPARATOR = Comparator.nullsFirst(Double::compare);
+
+  private static final Aggregator NIL_AGGREGATOR = new DoubleAnyAggregator(
+      NilColumnValueSelector.instance()
+  )
+  {
+    @Override
+    public void aggregate()
+    {
+      // no-op
+    }
+  };
+
+  private static final BufferAggregator NIL_BUFFER_AGGREGATOR = new DoubleAnyBufferAggregator(
+      NilColumnValueSelector.instance()
+  )
+  {
+    @Override
+    public void aggregate(ByteBuffer buf, int position)
+    {
+      // no-op
+    }
+  };
+
+  private final String fieldName;
+  private final String name;
+  private final boolean storeDoubleAsFloat;
+
   @JsonCreator
   public DoubleAnyAggregatorFactory(
       @JsonProperty("name") String name,
-      @JsonProperty("fieldName") final String fieldName,
-      @JsonProperty("expression") @Nullable String expression,
-      @JacksonInject ExprMacroTable macroTable
+      @JsonProperty("fieldName") final String fieldName
   )
   {
-    super(macroTable, name, fieldName, expression);
-  }
+    Preconditions.checkNotNull(name, "Must have a valid, non-null aggregator name");
+    Preconditions.checkNotNull(fieldName, "Must have a valid, non-null fieldName");
 
-  public DoubleAnyAggregatorFactory(String name, String fieldName)
-  {
-    this(name, fieldName, null, ExprMacroTable.nil());
+    this.name = name;
+    this.fieldName = fieldName;
+    this.storeDoubleAsFloat = ColumnHolder.storeDoubleAsFloat();
   }
 
   @Override
-  protected double nullValue()
+  public Aggregator factorize(ColumnSelectorFactory metricFactory)
   {
-    return Double.NaN;
+    final BaseDoubleColumnValueSelector valueSelector = metricFactory.makeColumnValueSelector(fieldName);
+    if (valueSelector instanceof NilColumnValueSelector) {
+      return NIL_AGGREGATOR;
+    } else {
+      return new DoubleAnyAggregator(
+          valueSelector
+      );
+    }
   }
 
   @Override
-  protected Aggregator buildAggregator(BaseDoubleColumnValueSelector selector)
+  public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
   {
-    return new DoubleAnyAggregator(selector);
+    final BaseDoubleColumnValueSelector valueSelector = metricFactory.makeColumnValueSelector(fieldName);
+    if (valueSelector instanceof NilColumnValueSelector) {
+      return NIL_BUFFER_AGGREGATOR;
+    } else {
+      return new DoubleAnyBufferAggregator(
+          valueSelector
+      );
+    }
   }
 
   @Override
-  protected BufferAggregator buildBufferAggregator(BaseDoubleColumnValueSelector selector)
+  public Comparator getComparator()
   {
-    return new DoubleAnyBufferAggregator(selector);
+    return VALUE_COMPARATOR;
   }
 
   @Override
   @Nullable
   public Object combine(@Nullable Object lhs, @Nullable Object rhs)
   {
-    if (lhs != null) {
-      return lhs;
-    } else {
-      return rhs;
-    }
+    return lhs;
+  }
+
+  @Override
+  public AggregateCombiner makeAggregateCombiner()
+  {
+    throw new UOE("DoubleAnyAggregatorFactory is not supported during ingestion for rollup");
   }
 
   @Override
   public AggregatorFactory getCombiningFactory()
   {
-    return new DoubleAnyAggregatorFactory(name, name, null, macroTable);
+    return new DoubleAnyAggregatorFactory(name, name);
   }
 
   @Override
   public List<AggregatorFactory> getRequiredColumns()
   {
-    return Collections.singletonList(new DoubleAnyAggregatorFactory(fieldName, fieldName, expression, macroTable));
+    return Collections.singletonList(new DoubleAnyAggregatorFactory(fieldName, fieldName));
+  }
+
+  @Override
+  public Object deserialize(Object object)
+  {
+    // handle "NaN" / "Infinity" values serialized as strings in JSON
+    if (object instanceof String) {
+      return Double.parseDouble((String) object);
+    }
+    return object;
+  }
+
+  @Override
+  @Nullable
+  public Object finalizeComputation(@Nullable Object object)
+  {
+    return object;
+  }
+
+  @Override
+  @JsonProperty
+  public String getName()
+  {
+    return name;
+  }
+
+  @JsonProperty
+  public String getFieldName()
+  {
+    return fieldName;
+  }
+
+  @Override
+  public List<String> requiredFields()
+  {
+    return Collections.singletonList(fieldName);
   }
 
   @Override
   public byte[] getCacheKey()
   {
     return new CacheKeyBuilder(AggregatorUtil.DOUBLE_ANY_CACHE_TYPE_ID)
         .appendString(fieldName)
-        .appendString(expression)
         .build();
   }
 
+  @Override
+  public String getTypeName()
+  {
+    // if we don't pretend to be a primitive, group by v1 gets sad and doesn't work because no complex type serde
 
 Review comment:
   removed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r377977604
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/query/aggregation/any/DoubleAnyAggregatorFactory.java
 ##########
 @@ -19,103 +19,214 @@
 
 package org.apache.druid.query.aggregation.any;
 
-import com.fasterxml.jackson.annotation.JacksonInject;
 import com.fasterxml.jackson.annotation.JsonCreator;
 import com.fasterxml.jackson.annotation.JsonProperty;
-import org.apache.druid.math.expr.ExprMacroTable;
+import com.google.common.base.Preconditions;
+import org.apache.druid.java.util.common.UOE;
+import org.apache.druid.query.aggregation.AggregateCombiner;
 import org.apache.druid.query.aggregation.Aggregator;
 import org.apache.druid.query.aggregation.AggregatorFactory;
 import org.apache.druid.query.aggregation.AggregatorUtil;
 import org.apache.druid.query.aggregation.BufferAggregator;
-import org.apache.druid.query.aggregation.SimpleDoubleAggregatorFactory;
+import org.apache.druid.query.aggregation.DoubleSumAggregator;
 import org.apache.druid.query.cache.CacheKeyBuilder;
 import org.apache.druid.segment.BaseDoubleColumnValueSelector;
+import org.apache.druid.segment.ColumnSelectorFactory;
+import org.apache.druid.segment.NilColumnValueSelector;
+import org.apache.druid.segment.column.ColumnHolder;
 
 import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
 import java.util.Collections;
+import java.util.Comparator;
 import java.util.List;
+import java.util.Objects;
 
-public class DoubleAnyAggregatorFactory extends SimpleDoubleAggregatorFactory
+public class DoubleAnyAggregatorFactory extends AggregatorFactory
 {
+  private static final Comparator<Double> VALUE_COMPARATOR = Comparator.nullsFirst(Double::compare);
+
+  private static final Aggregator NIL_AGGREGATOR = new DoubleAnyAggregator(
+      NilColumnValueSelector.instance()
+  )
+  {
+    @Override
+    public void aggregate()
+    {
+      // no-op
+    }
+  };
+
+  private static final BufferAggregator NIL_BUFFER_AGGREGATOR = new DoubleAnyBufferAggregator(
+      NilColumnValueSelector.instance()
+  )
+  {
+    @Override
+    public void aggregate(ByteBuffer buf, int position)
+    {
+      // no-op
+    }
+  };
+
+  private final String fieldName;
+  private final String name;
+  private final boolean storeDoubleAsFloat;
+
   @JsonCreator
   public DoubleAnyAggregatorFactory(
       @JsonProperty("name") String name,
-      @JsonProperty("fieldName") final String fieldName,
-      @JsonProperty("expression") @Nullable String expression,
-      @JacksonInject ExprMacroTable macroTable
+      @JsonProperty("fieldName") final String fieldName
   )
   {
-    super(macroTable, name, fieldName, expression);
-  }
+    Preconditions.checkNotNull(name, "Must have a valid, non-null aggregator name");
+    Preconditions.checkNotNull(fieldName, "Must have a valid, non-null fieldName");
 
-  public DoubleAnyAggregatorFactory(String name, String fieldName)
-  {
-    this(name, fieldName, null, ExprMacroTable.nil());
+    this.name = name;
+    this.fieldName = fieldName;
+    this.storeDoubleAsFloat = ColumnHolder.storeDoubleAsFloat();
   }
 
   @Override
-  protected double nullValue()
+  public Aggregator factorize(ColumnSelectorFactory metricFactory)
   {
-    return Double.NaN;
+    final BaseDoubleColumnValueSelector valueSelector = metricFactory.makeColumnValueSelector(fieldName);
+    if (valueSelector instanceof NilColumnValueSelector) {
+      return NIL_AGGREGATOR;
+    } else {
+      return new DoubleAnyAggregator(
+          valueSelector
+      );
+    }
   }
 
   @Override
-  protected Aggregator buildAggregator(BaseDoubleColumnValueSelector selector)
+  public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
   {
-    return new DoubleAnyAggregator(selector);
+    final BaseDoubleColumnValueSelector valueSelector = metricFactory.makeColumnValueSelector(fieldName);
+    if (valueSelector instanceof NilColumnValueSelector) {
+      return NIL_BUFFER_AGGREGATOR;
+    } else {
+      return new DoubleAnyBufferAggregator(
+          valueSelector
+      );
+    }
   }
 
   @Override
-  protected BufferAggregator buildBufferAggregator(BaseDoubleColumnValueSelector selector)
+  public Comparator getComparator()
   {
-    return new DoubleAnyBufferAggregator(selector);
+    return VALUE_COMPARATOR;
   }
 
   @Override
   @Nullable
   public Object combine(@Nullable Object lhs, @Nullable Object rhs)
   {
-    if (lhs != null) {
-      return lhs;
-    } else {
-      return rhs;
-    }
+    return lhs;
 
 Review comment:
   to be consistent with the new policy of not discriminating null values #equality 
   Decided not to make assumption in preferring non-null value. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] sthetland commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
sthetland commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r377813959
 
 

 ##########
 File path: docs/querying/aggregations.md
 ##########
 @@ -238,7 +238,7 @@ Note that queries with first/last aggregators on a segment created with rollup e
 
 (Double/Float/Long/String) ANY aggregator cannot be used in ingestion spec, and should only be specified as part of queries.
 
-If `druid.generic.useDefaultValueForNull=true` aggregation can returns the default value for null and does not prefer "non-null" values over the default value for null. If `druid.generic.useDefaultValueForNull=false`, then aggregation will returns any non-null value.
+If `druid.generic.useDefaultValueForNull=true` aggregation can returns the default value for null and does not prefer "non-null" values over the default value for null. If `druid.generic.useDefaultValueForNull=false`, then aggregation will returns any value including null.
 
 Review comment:
   +1 to Suneet's rewording. Agreed as well that we should add the "why". Have filed https://implydata.atlassian.net/browse/DOCS-30 to track. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r377777664
 
 

 ##########
 File path: docs/querying/aggregations.md
 ##########
 @@ -238,7 +238,7 @@ Note that queries with first/last aggregators on a segment created with rollup e
 
 (Double/Float/Long/String) ANY aggregator cannot be used in ingestion spec, and should only be specified as part of queries.
 
-If `druid.generic.useDefaultValueForNull=true` aggregation can returns the default value for null and does not prefer "non-null" values over the default value for null. If `druid.generic.useDefaultValueForNull=false`, then aggregation will returns any non-null value.
+If `druid.generic.useDefaultValueForNull=true` aggregation can returns the default value for null and does not prefer "non-null" values over the default value for null. If `druid.generic.useDefaultValueForNull=false`, then aggregation will returns any value including null.
 
 Review comment:
   Suggested re-wording to explain how null handling works
   
   ```
   This aggregator will return any value for the provided expression. It does not prefer non-null values over null values. If `druid.generic.useDefaultValueForNull` is set to true, this will not return null, but instead return the default value. Otherwise this aggregator may return null
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r377942115
 
 

 ##########
 File path: server/src/main/java/org/apache/druid/server/QueryLifecycle.java
 ##########
 @@ -318,14 +318,12 @@ public void emitLogsAndMetrics(
 
       if (e != null) {
         statsMap.put("exception", e.toString());
-
-        if (e instanceof QueryInterruptedException) {
-          // Mimic behavior from QueryResource, where this code was originally taken from.
-          log.noStackTrace().warn(e, "Exception while processing queryId [%s]", baseQuery.getId());
-          statsMap.put("interrupted", true);
-          statsMap.put("reason", e.toString());
-        }
+        // Mimic behavior from QueryResource, where this code was originally taken from.
 
 Review comment:
   Why this change? I don't think it is correct, since this stuff should only be set if it was a `QueryInterruptedException`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r378152939
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/query/aggregation/any/NumericAnyBufferAggregator.java
 ##########
 @@ -32,11 +32,11 @@
 public abstract class NumericAnyBufferAggregator<TSelector extends BaseNullableColumnValueSelector>
     implements BufferAggregator
 {
-  private static final byte BYTE_FLAG_IS_NOT_SET = 0;
-  private static final byte BYTE_FLAG_IS_SET = 1;
-  private static final int IS_FOUND_FLAG_OFFSET_POSITION = 0;
-  private static final int IS_NULL_FLAG_OFFSET_POSITION = IS_FOUND_FLAG_OFFSET_POSITION + Byte.BYTES;
-  private static final int FOUND_VALUE_OFFSET_POSITION = IS_NULL_FLAG_OFFSET_POSITION + Byte.BYTES;
+  // Rightmost bit for is null check (0 for is null and 1 for not null)
+  // Second rightmost bit for is found check (0 for not found and 1 for found)
+  private static final byte BYTE_FLAG_FOUND_MASK = 0b0010;
 
 Review comment:
   super super nit, but can you use hex? (`0x02` and `0x01` for `BYTE_FLAG_NULL_MASK`)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r378002826
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/query/aggregation/any/NumericAnyBufferAggregator.java
 ##########
 @@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.any;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.BufferAggregator;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.BaseNullableColumnValueSelector;
+
+import java.nio.ByteBuffer;
+
+/**
+ * Base type for buffer based 'any' aggregator for primitive numeric column selectors
+ */
+public abstract class NumericAnyBufferAggregator<TSelector extends BaseNullableColumnValueSelector>
+    implements BufferAggregator
+{
+  private static final byte BYTE_FLAG_IS_NOT_SET = 0;
+  private static final byte BYTE_FLAG_IS_SET = 1;
+  private static final int IS_FOUND_FLAG_OFFSET_POSITION = 0;
 
 Review comment:
   Done

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r378004999
 
 

 ##########
 File path: server/src/main/java/org/apache/druid/server/QueryLifecycle.java
 ##########
 @@ -318,14 +318,12 @@ public void emitLogsAndMetrics(
 
       if (e != null) {
         statsMap.put("exception", e.toString());
-
-        if (e instanceof QueryInterruptedException) {
-          // Mimic behavior from QueryResource, where this code was originally taken from.
-          log.noStackTrace().warn(e, "Exception while processing queryId [%s]", baseQuery.getId());
-          statsMap.put("interrupted", true);
-          statsMap.put("reason", e.toString());
-        }
+        // Mimic behavior from QueryResource, where this code was originally taken from.
 
 Review comment:
   Changed to just moving the logging out of the if check

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r378151135
 
 

 ##########
 File path: docs/querying/sql.md
 ##########
 @@ -205,7 +205,7 @@ Only the COUNT aggregation can accept DISTINCT.
 |`EARLIEST(expr, maxBytesPerString)`|Like `EARLIEST(expr)`, but for strings. The `maxBytesPerString` parameter determines how much aggregation space to allocate per string. Strings longer than this limit will be truncated. This parameter should be set as low as possible, since high values will lead to wasted memory.|
 |`LATEST(expr)`|Returns the latest value of `expr`, which must be numeric. If `expr` comes from a relation with a timestamp column (like a Druid datasource) then "latest" is the value last encountered with the maximum overall timestamp of all values being aggregated. If `expr` does not come from a relation with a timestamp, then it is simply the last value encountered.|
 |`LATEST(expr, maxBytesPerString)`|Like `LATEST(expr)`, but for strings. The `maxBytesPerString` parameter determines how much aggregation space to allocate per string. Strings longer than this limit will be truncated. This parameter should be set as low as possible, since high values will lead to wasted memory.|
-|`ANY_VALUE(expr)`|Returns any value of `expr`, which must be numeric. If `druid.generic.useDefaultValueForNull=true` this can return the default value for null and does not prefer "non-null" values over the default value for null. If `druid.generic.useDefaultValueForNull=false`, then this will return any value of `expr` including null|
+|`ANY_VALUE(expr)`|Returns any value of `expr` including null. `expr` must be numeric. This aggregator simplify and optimize the performance by returning the first encoutnered value (including null)|
 
 Review comment:
   nit: should be 'This aggregator _can_ simplify..' 
   
   Should this note also be added to the description of string any value?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r377934075
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/query/aggregation/any/NumericAnyBufferAggregator.java
 ##########
 @@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.any;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.BufferAggregator;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.BaseNullableColumnValueSelector;
+
+import java.nio.ByteBuffer;
+
+/**
+ * Base type for buffer based 'any' aggregator for primitive numeric column selectors
+ */
+public abstract class NumericAnyBufferAggregator<TSelector extends BaseNullableColumnValueSelector>
+    implements BufferAggregator
+{
+  private static final byte BYTE_FLAG_IS_NOT_SET = 0;
+  private static final byte BYTE_FLAG_IS_SET = 1;
+  private static final int IS_FOUND_FLAG_OFFSET_POSITION = 0;
 
 Review comment:
   nit: I think this `0` offset variable makes the put and get operations more complicated than they need to be, suggest just dropping this and not adding or removing anything.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] sthetland commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
sthetland commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r377814677
 
 

 ##########
 File path: docs/querying/sql.md
 ##########
 @@ -205,7 +205,7 @@ Only the COUNT aggregation can accept DISTINCT.
 |`EARLIEST(expr, maxBytesPerString)`|Like `EARLIEST(expr)`, but for strings. The `maxBytesPerString` parameter determines how much aggregation space to allocate per string. Strings longer than this limit will be truncated. This parameter should be set as low as possible, since high values will lead to wasted memory.|
 |`LATEST(expr)`|Returns the latest value of `expr`, which must be numeric. If `expr` comes from a relation with a timestamp column (like a Druid datasource) then "latest" is the value last encountered with the maximum overall timestamp of all values being aggregated. If `expr` does not come from a relation with a timestamp, then it is simply the last value encountered.|
 |`LATEST(expr, maxBytesPerString)`|Like `LATEST(expr)`, but for strings. The `maxBytesPerString` parameter determines how much aggregation space to allocate per string. Strings longer than this limit will be truncated. This parameter should be set as low as possible, since high values will lead to wasted memory.|
-|`ANY_VALUE(expr)`|Returns any value of `expr`, which must be numeric. If `druid.generic.useDefaultValueForNull=true` this can return the default value for null and does not prefer "non-null" values over the default value for null. If `druid.generic.useDefaultValueForNull=false`, then this will return any non-null value of `expr`|
+|`ANY_VALUE(expr)`|Returns any value of `expr`, which must be numeric. If `druid.generic.useDefaultValueForNull=true` this can return the default value for null and does not prefer "non-null" values over the default value for null. If `druid.generic.useDefaultValueForNull=false`, then this will return any value of `expr` including null|
 
 Review comment:
   nit, adding a comma: "If `druid.generic.useDefaultValueForNull=true`, this can return..."

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r377802461
 
 

 ##########
 File path: docs/querying/aggregations.md
 ##########
 @@ -238,7 +238,7 @@ Note that queries with first/last aggregators on a segment created with rollup e
 
 (Double/Float/Long/String) ANY aggregator cannot be used in ingestion spec, and should only be specified as part of queries.
 
-If `druid.generic.useDefaultValueForNull=true` aggregation can returns the default value for null and does not prefer "non-null" values over the default value for null. If `druid.generic.useDefaultValueForNull=false`, then aggregation will returns any non-null value.
+If `druid.generic.useDefaultValueForNull=true` aggregation can returns the default value for null and does not prefer "non-null" values over the default value for null. If `druid.generic.useDefaultValueForNull=false`, then aggregation will returns any value including null.
 
 Review comment:
   EDIT: @maytasm3 pointed out this benchmark - https://static.imply.io/gianm/vb.html which shows the performance gain of not having to read a value. Since ANY only reads the first value, this will be faster than min, even thought the aggregator has to loop over all values. 
   
   I should have known that... 🤦‍♂ 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r377965421
 
 

 ##########
 File path: docs/querying/aggregations.md
 ##########
 @@ -238,7 +238,7 @@ Note that queries with first/last aggregators on a segment created with rollup e
 
 (Double/Float/Long/String) ANY aggregator cannot be used in ingestion spec, and should only be specified as part of queries.
 
-If `druid.generic.useDefaultValueForNull=true` aggregation can returns the default value for null and does not prefer "non-null" values over the default value for null. If `druid.generic.useDefaultValueForNull=false`, then aggregation will returns any non-null value.
+If `druid.generic.useDefaultValueForNull=true` aggregation can returns the default value for null and does not prefer "non-null" values over the default value for null. If `druid.generic.useDefaultValueForNull=false`, then aggregation will returns any value including null.
 
 Review comment:
   Added the why (short version of what Suneet wrote)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r378428020
 
 

 ##########
 File path: docs/querying/sql.md
 ##########
 @@ -205,7 +205,7 @@ Only the COUNT aggregation can accept DISTINCT.
 |`EARLIEST(expr, maxBytesPerString)`|Like `EARLIEST(expr)`, but for strings. The `maxBytesPerString` parameter determines how much aggregation space to allocate per string. Strings longer than this limit will be truncated. This parameter should be set as low as possible, since high values will lead to wasted memory.|
 |`LATEST(expr)`|Returns the latest value of `expr`, which must be numeric. If `expr` comes from a relation with a timestamp column (like a Druid datasource) then "latest" is the value last encountered with the maximum overall timestamp of all values being aggregated. If `expr` does not come from a relation with a timestamp, then it is simply the last value encountered.|
 |`LATEST(expr, maxBytesPerString)`|Like `LATEST(expr)`, but for strings. The `maxBytesPerString` parameter determines how much aggregation space to allocate per string. Strings longer than this limit will be truncated. This parameter should be set as low as possible, since high values will lead to wasted memory.|
-|`ANY_VALUE(expr)`|Returns any value of `expr`, which must be numeric. If `druid.generic.useDefaultValueForNull=true` this can return the default value for null and does not prefer "non-null" values over the default value for null. If `druid.generic.useDefaultValueForNull=false`, then this will return any value of `expr` including null|
+|`ANY_VALUE(expr)`|Returns any value of `expr` including null. `expr` must be numeric. This aggregator simplify and optimize the performance by returning the first encoutnered value (including null)|
 
 Review comment:
   Added 'can'. The string any indicates "Like `ANY_VALUE(expr)`" which implies that whatever is mentioned for the `ANY_VALUE(expr)` also applies to the string any. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
maytasm3 commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r377997282
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/query/aggregation/any/DoubleAnyAggregator.java
 ##########
 @@ -19,44 +19,35 @@
 
 package org.apache.druid.query.aggregation.any;
 
-import org.apache.druid.query.aggregation.Aggregator;
-import org.apache.druid.query.aggregation.NullableNumericAggregator;
-import org.apache.druid.query.aggregation.NullableNumericAggregatorFactory;
 import org.apache.druid.segment.BaseDoubleColumnValueSelector;
 
+import javax.annotation.Nullable;
+
 /**
- * This Aggregator is created by the {@link DoubleAnyAggregatorFactory} which extends from
- * {@link NullableNumericAggregatorFactory}. If null needs to be handle, then {@link NullableNumericAggregatorFactory}
- * will wrap this aggregator in {@link NullableNumericAggregator} and can handle all null in that class.
- * Hence, no null will ever be pass into this aggregator from the valueSelector.
+ * This Aggregator is created by the {@link DoubleAnyAggregatorFactory} which has no special null handling logic.
 
 Review comment:
   Removed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r376649135
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/query/aggregation/any/DoubleAnyAggregatorFactory.java
 ##########
 @@ -19,103 +19,214 @@
 
 package org.apache.druid.query.aggregation.any;
 
-import com.fasterxml.jackson.annotation.JacksonInject;
 import com.fasterxml.jackson.annotation.JsonCreator;
 import com.fasterxml.jackson.annotation.JsonProperty;
-import org.apache.druid.math.expr.ExprMacroTable;
+import com.google.common.base.Preconditions;
+import org.apache.druid.java.util.common.UOE;
+import org.apache.druid.query.aggregation.AggregateCombiner;
 import org.apache.druid.query.aggregation.Aggregator;
 import org.apache.druid.query.aggregation.AggregatorFactory;
 import org.apache.druid.query.aggregation.AggregatorUtil;
 import org.apache.druid.query.aggregation.BufferAggregator;
-import org.apache.druid.query.aggregation.SimpleDoubleAggregatorFactory;
+import org.apache.druid.query.aggregation.DoubleSumAggregator;
 import org.apache.druid.query.cache.CacheKeyBuilder;
 import org.apache.druid.segment.BaseDoubleColumnValueSelector;
+import org.apache.druid.segment.ColumnSelectorFactory;
+import org.apache.druid.segment.NilColumnValueSelector;
+import org.apache.druid.segment.column.ColumnHolder;
 
 import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
 import java.util.Collections;
+import java.util.Comparator;
 import java.util.List;
+import java.util.Objects;
 
-public class DoubleAnyAggregatorFactory extends SimpleDoubleAggregatorFactory
+public class DoubleAnyAggregatorFactory extends AggregatorFactory
 {
+  private static final Comparator<Double> VALUE_COMPARATOR = Comparator.nullsFirst(Double::compare);
+
+  private static final Aggregator NIL_AGGREGATOR = new DoubleAnyAggregator(
+      NilColumnValueSelector.instance()
+  )
+  {
+    @Override
+    public void aggregate()
+    {
+      // no-op
+    }
+  };
+
+  private static final BufferAggregator NIL_BUFFER_AGGREGATOR = new DoubleAnyBufferAggregator(
+      NilColumnValueSelector.instance()
+  )
+  {
+    @Override
+    public void aggregate(ByteBuffer buf, int position)
+    {
+      // no-op
+    }
+  };
+
+  private final String fieldName;
+  private final String name;
+  private final boolean storeDoubleAsFloat;
+
   @JsonCreator
   public DoubleAnyAggregatorFactory(
       @JsonProperty("name") String name,
-      @JsonProperty("fieldName") final String fieldName,
-      @JsonProperty("expression") @Nullable String expression,
-      @JacksonInject ExprMacroTable macroTable
+      @JsonProperty("fieldName") final String fieldName
   )
   {
-    super(macroTable, name, fieldName, expression);
-  }
+    Preconditions.checkNotNull(name, "Must have a valid, non-null aggregator name");
+    Preconditions.checkNotNull(fieldName, "Must have a valid, non-null fieldName");
 
-  public DoubleAnyAggregatorFactory(String name, String fieldName)
-  {
-    this(name, fieldName, null, ExprMacroTable.nil());
+    this.name = name;
+    this.fieldName = fieldName;
+    this.storeDoubleAsFloat = ColumnHolder.storeDoubleAsFloat();
   }
 
   @Override
-  protected double nullValue()
+  public Aggregator factorize(ColumnSelectorFactory metricFactory)
   {
-    return Double.NaN;
+    final BaseDoubleColumnValueSelector valueSelector = metricFactory.makeColumnValueSelector(fieldName);
+    if (valueSelector instanceof NilColumnValueSelector) {
+      return NIL_AGGREGATOR;
+    } else {
+      return new DoubleAnyAggregator(
+          valueSelector
+      );
+    }
   }
 
   @Override
-  protected Aggregator buildAggregator(BaseDoubleColumnValueSelector selector)
+  public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
   {
-    return new DoubleAnyAggregator(selector);
+    final BaseDoubleColumnValueSelector valueSelector = metricFactory.makeColumnValueSelector(fieldName);
+    if (valueSelector instanceof NilColumnValueSelector) {
+      return NIL_BUFFER_AGGREGATOR;
+    } else {
+      return new DoubleAnyBufferAggregator(
+          valueSelector
+      );
+    }
   }
 
   @Override
-  protected BufferAggregator buildBufferAggregator(BaseDoubleColumnValueSelector selector)
+  public Comparator getComparator()
   {
-    return new DoubleAnyBufferAggregator(selector);
+    return VALUE_COMPARATOR;
   }
 
   @Override
   @Nullable
   public Object combine(@Nullable Object lhs, @Nullable Object rhs)
   {
-    if (lhs != null) {
-      return lhs;
-    } else {
-      return rhs;
-    }
+    return lhs;
 
 Review comment:
   I think from like a .. user satisfaction perspective, it might still be nice to prefer non-null values since it is still legitimate.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r377938220
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/query/aggregation/any/NumericAnyBufferAggregator.java
 ##########
 @@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.any;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.BufferAggregator;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.BaseNullableColumnValueSelector;
+
+import java.nio.ByteBuffer;
+
+/**
+ * Base type for buffer based 'any' aggregator for primitive numeric column selectors
+ */
+public abstract class NumericAnyBufferAggregator<TSelector extends BaseNullableColumnValueSelector>
+    implements BufferAggregator
+{
+  private static final byte BYTE_FLAG_IS_NOT_SET = 0;
+  private static final byte BYTE_FLAG_IS_SET = 1;
+  private static final int IS_FOUND_FLAG_OFFSET_POSITION = 0;
+  private static final int IS_NULL_FLAG_OFFSET_POSITION = IS_FOUND_FLAG_OFFSET_POSITION + Byte.BYTES;
+  private static final int FOUND_VALUE_OFFSET_POSITION = IS_NULL_FLAG_OFFSET_POSITION + Byte.BYTES;
 
 Review comment:
   nit: suggest just making this package private and having subclasses use this directly in their putValue implementations instead of `getFoundValueStoredPosition` function call, and maybe just calling it `VALUE_OFFSET` or `FOUND_VALUE_OFFSET` since the position seems redundant.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis merged pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
clintropolis merged pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r378150348
 
 

 ##########
 File path: docs/querying/aggregations.md
 ##########
 @@ -238,7 +238,7 @@ Note that queries with first/last aggregators on a segment created with rollup e
 
 (Double/Float/Long/String) ANY aggregator cannot be used in ingestion spec, and should only be specified as part of queries.
 
-If `druid.generic.useDefaultValueForNull=true` aggregation can returns the default value for null and does not prefer "non-null" values over the default value for null. If `druid.generic.useDefaultValueForNull=false`, then aggregation will returns any value including null.
+Returns any value including null. This aggregator simplify and optimize the performance by returning the first encoutnered value (including null)
 
 Review comment:
   typo 'encoutnered' is causing CI failure

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r377773517
 
 

 ##########
 File path: docs/querying/aggregations.md
 ##########
 @@ -238,7 +238,7 @@ Note that queries with first/last aggregators on a segment created with rollup e
 
 (Double/Float/Long/String) ANY aggregator cannot be used in ingestion spec, and should only be specified as part of queries.
 
-If `druid.generic.useDefaultValueForNull=true` aggregation can returns the default value for null and does not prefer "non-null" values over the default value for null. If `druid.generic.useDefaultValueForNull=false`, then aggregation will returns any non-null value.
+If `druid.generic.useDefaultValueForNull=true` aggregation can returns the default value for null and does not prefer "non-null" values over the default value for null. If `druid.generic.useDefaultValueForNull=false`, then aggregation will returns any value including null.
 
 Review comment:
   I think we should re-write this to explain why someone would use this aggregator similar to how it's explained in the snowflake docs - https://docs.snowflake.net/manuals/sql-reference/functions/any_value.html
   I'm not sure where the correct place in the docs is to explain this - since technically this is the spec for the native query and we have another page with a spec for sql.
   
   Here's my suggestion for the why:
   
   ```
   ANY aggregator can be used to simplify and optimize the performance of GROUP BY statements. A common problem for many queries is that the result of a query with a GROUP BY clause can only contain expressions used in the GROUP BY clause itself, or results of aggregate functions
   
   select customer.id , customer.name , sum(orders.value)
       from customer
       join orders on customer.id = orders.customer_id
       group by customer.id , customer.name;
   
   Since we know that each customer.id can have only one name, this can be optimized as
   
   select customer.id , ANY(customer.name) , sum(orders.value)
       from customer
       join orders on customer.id = orders.customer_id
       group by customer.id ;
   ```
   
   I should also point out, with the current implementation of aggregators, there is no advantage to using an ANY aggregator vs a min aggregator, but maybe that will change in the future 🤷‍♂ 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ccaominh commented on issue #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
ccaominh commented on issue #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#issuecomment-583782782
 
 
   Static analysis is flagging unused imports

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9317: ANY Aggregator should not skip null values implementation
URL: https://github.com/apache/druid/pull/9317#discussion_r377944007
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/query/aggregation/any/NumericAnyBufferAggregator.java
 ##########
 @@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.any;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.BufferAggregator;
+import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import org.apache.druid.segment.BaseNullableColumnValueSelector;
+
+import java.nio.ByteBuffer;
+
+/**
+ * Base type for buffer based 'any' aggregator for primitive numeric column selectors
+ */
+public abstract class NumericAnyBufferAggregator<TSelector extends BaseNullableColumnValueSelector>
+    implements BufferAggregator
+{
+  private static final byte BYTE_FLAG_IS_NOT_SET = 0;
+  private static final byte BYTE_FLAG_IS_SET = 1;
+  private static final int IS_FOUND_FLAG_OFFSET_POSITION = 0;
 
 Review comment:
   Further, since is null is only going to be either 0 or 1, you save a byte per result and could use any of the high bits of this null byte to set whether or not you have found a value. Instead of an offset for null flag, you define a bit mask and your found check becomes something like `buf.get(position) & FOUND_MASK == FOUND_MASK` or whatever.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org