You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/09/01 23:56:41 UTC

[GitHub] [druid] suneet-s opened a new pull request #10340: Fix VARIANCE aggregator comparator

suneet-s opened a new pull request #10340:
URL: https://github.com/apache/druid/pull/10340


   ### Description
   
   The comparator for the variance aggregator used to compare values using the
   count. This is now fixed to compare values using the variance. If the variance
   is equal, the count and sum are used as tie breakers.
   
   This PR has:
   - [ ] been self-reviewed.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [x] been tested in a test Druid cluster.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #10340: Fix VARIANCE aggregator comparator

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #10340:
URL: https://github.com/apache/druid/pull/10340#discussion_r482515733



##########
File path: extensions-core/stats/src/main/java/org/apache/druid/query/aggregation/variance/VarianceAggregatorFactory.java
##########
@@ -239,7 +240,9 @@ public Comparator getComparator()
   @Override
   public Object finalizeComputation(@Nullable Object object)
   {
-    return object == null ? null : ((VarianceAggregatorCollector) object).getVariance(isVariancePop);
+    return object == null || ((VarianceAggregatorCollector) object).count == 0
+           ? null

Review comment:
       Hmm, I wonder if this should be `NullHandling.defaultDoubleValue()`.
   
   Will this just happen if nothing gets read?

##########
File path: extensions-core/stats/src/main/java/org/apache/druid/query/aggregation/variance/VarianceAggregatorCollector.java
##########
@@ -60,11 +60,11 @@ public static VarianceAggregatorCollector from(ByteBuffer buffer)
   }
 
   public static final Comparator<VarianceAggregatorCollector> COMPARATOR = (o1, o2) -> {
-    int compare = Longs.compare(o1.count, o2.count);
+    int compare = Doubles.compare(o1.nvariance, o2.nvariance);

Review comment:
       Will this sort the same way as `getVariance(isVariancePop)` in `finalizeComputation`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #10340: Fix VARIANCE aggregator comparator

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #10340:
URL: https://github.com/apache/druid/pull/10340#discussion_r483061296



##########
File path: extensions-core/stats/src/main/java/org/apache/druid/query/aggregation/variance/VarianceAggregatorCollector.java
##########
@@ -60,11 +60,11 @@ public static VarianceAggregatorCollector from(ByteBuffer buffer)
   }
 
   public static final Comparator<VarianceAggregatorCollector> COMPARATOR = (o1, o2) -> {
-    int compare = Longs.compare(o1.count, o2.count);
+    int compare = Doubles.compare(o1.nvariance, o2.nvariance);

Review comment:
       I think so. I based this on the comment against the `nvariance` variable
   
   ```
   double nvariance; // sum[x-avg^2] (this is actually n times of the variance)
   ```
   
   `getVariance(isVariancePop)` looks like it divides by the count to get variance instead of nvariance.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #10340: Fix VARIANCE aggregator comparator

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #10340:
URL: https://github.com/apache/druid/pull/10340#discussion_r483160224



##########
File path: extensions-core/stats/src/main/java/org/apache/druid/query/aggregation/variance/VarianceAggregatorFactory.java
##########
@@ -239,7 +240,9 @@ public Comparator getComparator()
   @Override
   public Object finalizeComputation(@Nullable Object object)
   {
-    return object == null ? null : ((VarianceAggregatorCollector) object).getVariance(isVariancePop);
+    return object == null || ((VarianceAggregatorCollector) object).count == 0
+           ? null

Review comment:
       I approved the patch, but please also change the comment to become accurate. Perhaps also include a test like the empty timeseries one I mentioned.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s merged pull request #10340: Fix VARIANCE aggregator comparator

Posted by GitBox <gi...@apache.org>.
suneet-s merged pull request #10340:
URL: https://github.com/apache/druid/pull/10340


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #10340: Fix VARIANCE aggregator comparator

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #10340:
URL: https://github.com/apache/druid/pull/10340#discussion_r483063690



##########
File path: extensions-core/stats/src/main/java/org/apache/druid/query/aggregation/variance/VarianceAggregatorFactory.java
##########
@@ -239,7 +240,9 @@ public Comparator getComparator()
   @Override
   public Object finalizeComputation(@Nullable Object object)
   {
-    return object == null ? null : ((VarianceAggregatorCollector) object).getVariance(isVariancePop);
+    return object == null || ((VarianceAggregatorCollector) object).count == 0
+           ? null

Review comment:
       I think so. I was able to hit this bug because of `VarianceSqlAggregatorTest#testVarianceOrderBy` One of the columns had no value for a row that was being grouped on.
   
   I then read this comment in `getVariance(...)` 
   
   ```
   // in SQL standard, we should return null for zero elements. But druid there should not be such a case
   ```
   which is why I returned null.
   
   I guess it could be nicer to return `NullHandling.defaultDoubleValue()` instead




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #10340: Fix VARIANCE aggregator comparator

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #10340:
URL: https://github.com/apache/druid/pull/10340#discussion_r483158776



##########
File path: extensions-core/stats/src/main/java/org/apache/druid/query/aggregation/variance/VarianceAggregatorFactory.java
##########
@@ -239,7 +240,9 @@ public Comparator getComparator()
   @Override
   public Object finalizeComputation(@Nullable Object object)
   {
-    return object == null ? null : ((VarianceAggregatorCollector) object).getVariance(isVariancePop);
+    return object == null || ((VarianceAggregatorCollector) object).count == 0
+           ? null

Review comment:
       Ah, that comment is out of date. It used to be true but now timeseries queries can return a single row, SQL-style, even if there's no input data. There's a test for it in TImeseriesQueryRunnerTest -> testEmptyTimeseries. It looks like it's expecting `NullHandling.defaultDoubleValue()` for the aggregators it tests.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #10340: Fix VARIANCE aggregator comparator

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #10340:
URL: https://github.com/apache/druid/pull/10340#discussion_r483061296



##########
File path: extensions-core/stats/src/main/java/org/apache/druid/query/aggregation/variance/VarianceAggregatorCollector.java
##########
@@ -60,11 +60,11 @@ public static VarianceAggregatorCollector from(ByteBuffer buffer)
   }
 
   public static final Comparator<VarianceAggregatorCollector> COMPARATOR = (o1, o2) -> {
-    int compare = Longs.compare(o1.count, o2.count);
+    int compare = Doubles.compare(o1.nvariance, o2.nvariance);

Review comment:
       I think so. I based this on the comment against the `nvariance` variable
   
   ```
   double nvariance; // sum[x-avg^2] (this is actually n times of the variance)
   ```
   
   `getVariance(isVariancePop)` looks like it divides by the count to get variance instead of n * variance.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #10340: Fix VARIANCE aggregator comparator

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #10340:
URL: https://github.com/apache/druid/pull/10340#discussion_r483164648



##########
File path: extensions-core/stats/src/main/java/org/apache/druid/query/aggregation/variance/VarianceAggregatorFactory.java
##########
@@ -239,7 +240,9 @@ public Comparator getComparator()
   @Override
   public Object finalizeComputation(@Nullable Object object)
   {
-    return object == null ? null : ((VarianceAggregatorCollector) object).getVariance(isVariancePop);
+    return object == null || ((VarianceAggregatorCollector) object).count == 0
+           ? null

Review comment:
       👍 I'll add the test and push up a new patch




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org