You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/09/13 07:01:55 UTC

[GitHub] [druid] suneet-s opened a new pull request #10390: WIP Vectorized variance aggregators

suneet-s opened a new pull request #10390:
URL: https://github.com/apache/druid/pull/10390


   ### Description
   
   Vectorize the variance aggregators.
   
   This PR is still missing tests. TODO: is it better to calculate variance for the vector as a batch or re-use the sliding window calculation?
   
   <hr>
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10390: WIP Vectorized variance aggregators

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10390:
URL: https://github.com/apache/druid/pull/10390#discussion_r487537127



##########
File path: extensions-core/stats/src/main/java/org/apache/druid/query/aggregation/variance/VarianceBufferAggregator.java
##########
@@ -213,9 +250,7 @@ public void aggregate(ByteBuffer buf, int position)
       count += holder2.count;
       sum += holder2.sum;
 
-      buf.putLong(position, count);
-      buf.putDouble(position + SUM_OFFSET, sum);
-      buf.putDouble(position + NVARIANCE_OFFSET, nvariance);
+      VarianceBufferAggregator.writeNVariance(buf, position, count, sum, nvariance);

Review comment:
       nit: 
   ```suggestion
         writeNVariance(buf, position, count, sum, nvariance);
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #10390: Vectorized variance aggregators

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #10390:
URL: https://github.com/apache/druid/pull/10390#discussion_r488305025



##########
File path: extensions-core/stats/src/main/java/org/apache/druid/query/aggregation/variance/VarianceAggregatorFactory.java
##########
@@ -162,10 +167,36 @@ public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
     throw new IAE(
         "Incompatible type for metric[%s], expected a float, double, long, or variance, but got a %s",
         fieldName,
-        inputType
+        type
     );
   }
 
+  @Override
+  public VectorAggregator factorizeVector(VectorColumnSelectorFactory selectorFactory)
+  {
+    final String type = getTypeString(selectorFactory);
+    if (ValueType.FLOAT.name().equalsIgnoreCase(type)) {

Review comment:
       Ah, this is actually only true if `inputType` isn't specified, so it's not _never_ true, just will never be true if the input type isn't explicitly specified as `"variance"`, and cant autodetect from its inputs.
   
   I think it would be nice if you fix it up at least for the vectorized codepath, but would also probably be nice to just fix it up for all of them if it isn't too much trouble, and I don't think waiting on #10277 is necessary since there are a handful of aggregator factories which could be improved to have a stronger check on their complex input type.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #10390: Vectorized variance aggregators

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #10390:
URL: https://github.com/apache/druid/pull/10390#discussion_r488319436



##########
File path: extensions-core/stats/src/main/java/org/apache/druid/query/aggregation/variance/VarianceAggregatorFactory.java
##########
@@ -162,10 +167,36 @@ public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
     throw new IAE(
         "Incompatible type for metric[%s], expected a float, double, long, or variance, but got a %s",
         fieldName,
-        inputType
+        type
     );
   }
 
+  @Override
+  public VectorAggregator factorizeVector(VectorColumnSelectorFactory selectorFactory)
+  {
+    final String type = getTypeString(selectorFactory);
+    if (ValueType.FLOAT.name().equalsIgnoreCase(type)) {

Review comment:
       Fix was easy for everything. Done with some tests. Waiting on a full review, and I'll push up a new patch.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis merged pull request #10390: Vectorized variance aggregators

Posted by GitBox <gi...@apache.org>.
clintropolis merged pull request #10390:
URL: https://github.com/apache/druid/pull/10390


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #10390: Vectorized variance aggregators

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #10390:
URL: https://github.com/apache/druid/pull/10390#discussion_r488292797



##########
File path: extensions-core/stats/src/main/java/org/apache/druid/query/aggregation/variance/VarianceAggregatorFactory.java
##########
@@ -162,10 +167,36 @@ public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
     throw new IAE(
         "Incompatible type for metric[%s], expected a float, double, long, or variance, but got a %s",
         fieldName,
-        inputType
+        type
     );
   }
 
+  @Override
+  public VectorAggregator factorizeVector(VectorColumnSelectorFactory selectorFactory)
+  {
+    final String type = getTypeString(selectorFactory);
+    if (ValueType.FLOAT.name().equalsIgnoreCase(type)) {

Review comment:
       Can I file a separate bug for this issue since it appears to be a bug independent of the vectorization code path? I can look at fixing that issue after #10277 goes in




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10390: WIP Vectorized variance aggregators

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10390:
URL: https://github.com/apache/druid/pull/10390#discussion_r487536757



##########
File path: extensions-core/stats/src/main/java/org/apache/druid/query/aggregation/variance/VarianceAggregatorFactory.java
##########
@@ -166,6 +169,32 @@ public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
     );
   }
 
+  @Override
+  public VectorAggregator factorizeVector(VectorColumnSelectorFactory selectorFactory)
+  {
+    final String type = getTypeString(selectorFactory);
+    if (ValueType.FLOAT.name().equalsIgnoreCase(type)) {
+      return new VarianceFloatVectorAggregator(selectorFactory.makeValueSelector(fieldName));
+    } else if (ValueType.DOUBLE.name().equalsIgnoreCase(type)) {
+      return new VarianceDoubleVectorAggregator(selectorFactory.makeValueSelector(fieldName));
+    } else if (ValueType.LONG.name().equalsIgnoreCase(type)) {
+      return new VarianceLongVectorAggregator(selectorFactory.makeValueSelector(fieldName));
+    } else if (VARIANCE_TYPE_NAME.equalsIgnoreCase(type)) {
+      return new VarianceObjectVectorAggregator(selectorFactory.makeObjectSelector(fieldName));
+    }
+    throw new IAE(
+        "Incompatible type for metric[%s], expected a float, double, long, or variance, but got a %s",
+        fieldName,
+        inputType

Review comment:
       It may be more informative to log `type` here instead since `inputType` could be null. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #10390: WIP Vectorized variance aggregators

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #10390:
URL: https://github.com/apache/druid/pull/10390#discussion_r487569614



##########
File path: extensions-core/stats/src/main/java/org/apache/druid/query/aggregation/variance/VarianceAggregatorFactory.java
##########
@@ -166,6 +169,32 @@ public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
     );
   }
 
+  @Override
+  public VectorAggregator factorizeVector(VectorColumnSelectorFactory selectorFactory)
+  {
+    final String type = getTypeString(selectorFactory);
+    if (ValueType.FLOAT.name().equalsIgnoreCase(type)) {
+      return new VarianceFloatVectorAggregator(selectorFactory.makeValueSelector(fieldName));
+    } else if (ValueType.DOUBLE.name().equalsIgnoreCase(type)) {
+      return new VarianceDoubleVectorAggregator(selectorFactory.makeValueSelector(fieldName));
+    } else if (ValueType.LONG.name().equalsIgnoreCase(type)) {
+      return new VarianceLongVectorAggregator(selectorFactory.makeValueSelector(fieldName));
+    } else if (VARIANCE_TYPE_NAME.equalsIgnoreCase(type)) {
+      return new VarianceObjectVectorAggregator(selectorFactory.makeObjectSelector(fieldName));
+    }
+    throw new IAE(
+        "Incompatible type for metric[%s], expected a float, double, long, or variance, but got a %s",
+        fieldName,
+        inputType

Review comment:
       good point. I just copy pasted the exception from above.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #10390: Vectorized variance aggregators

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #10390:
URL: https://github.com/apache/druid/pull/10390#discussion_r488175527



##########
File path: extensions-core/stats/src/main/java/org/apache/druid/query/aggregation/variance/VarianceAggregatorFactory.java
##########
@@ -162,10 +167,36 @@ public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
     throw new IAE(
         "Incompatible type for metric[%s], expected a float, double, long, or variance, but got a %s",
         fieldName,
-        inputType
+        type
     );
   }
 
+  @Override
+  public VectorAggregator factorizeVector(VectorColumnSelectorFactory selectorFactory)
+  {
+    final String type = getTypeString(selectorFactory);
+    if (ValueType.FLOAT.name().equalsIgnoreCase(type)) {

Review comment:
       Oh, this isn't actually correct. This method is backed by `getType`, which means `VARIANCE_TYPE_NAME.equalsIgnoreCase(type)` will never be true.  I suggest instead of using `getTypeString` to get the column capabilities, and then this enum can switch on `getType` directly instead of converting them to and from strings, and if type is `ValueType.COMPLEX` then to treat it as the object aggregator factory.
   
   After #10277 goes in, then we can check `getComplexTypeName` matches `VARIANCE_TYPE_NAME`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org