You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/08/20 13:39:22 UTC

[GitHub] [druid] abhishekagarwal87 opened a new pull request #10304: WIP: Add vectorization for druid-histogram extension

abhishekagarwal87 opened a new pull request #10304:
URL: https://github.com/apache/druid/pull/10304


   
   ### Description
   
   This PR adds vectorization support for Aggregators in the `druid-histogram` extension. While these changes are unlikely to result in the usage of SIMD instructions, they can still help gain performance in two ways I can think of
     - Being more cache-friendly and less number of function calls
     - Enable vectorization for the whole query when one of the participating aggregator is `Approximate Histogram` assuming other aggregators in query support vectorization. 
   
   I am yet to add more tests hence the PR is in draft stage. 
   
   <hr>
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not all of these items apply to every PR. Remove the items which are not done or not relevant to the PR. None of the items from the checklist above are strictly necessary, but it would be very helpful if you at least self-review the PR. -->
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `MyFoo`
    * `OurBar`
    * `TheirBaz`
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10304:
URL: https://github.com/apache/druid/pull/10304#discussion_r478285996



##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/FixedBucketsHistogramAggregatorFactory.java
##########
@@ -99,6 +105,34 @@ public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
     );
   }
 
+  @Override
+  public VectorAggregator factorizeVector(VectorColumnSelectorFactory columnSelectorFactory)
+  {
+    ColumnCapabilities capabilities = columnSelectorFactory.getColumnCapabilities(fieldName);
+    if (null == capabilities) {
+      throw new IAE("could not find the column type for column %s", fieldName);
+    }
+    ValueType type = capabilities.getType();
+    if (type.isNumeric()) {
+      return new FixedBucketsHistogramVectorAggregator(
+          columnSelectorFactory.makeValueSelector(fieldName),
+          lowerLimit,
+          upperLimit,
+          numBuckets,
+          outlierHandlingMode
+      );
+    } else {
+      throw new IAE("cannot vectorize fixed bucket histogram aggregation for type %s", type);
+    }
+  }
+
+  @Override
+  public boolean canVectorize(ColumnInspector columnInspector)
+  {
+    ColumnCapabilities capabilities = columnInspector.getColumnCapabilities(fieldName);
+    return (capabilities != null) && capabilities.getType().isNumeric();

Review comment:
       Yes. I only meant to handle numeric types for now. It seems it can also take String (base64) as well as complex objects. I decided to not do that in the current PR given these extensions are also deprecated and not recommended for use anymore. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei merged pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
jon-wei merged pull request #10304:
URL: https://github.com/apache/druid/pull/10304


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on pull request #10304:
URL: https://github.com/apache/druid/pull/10304#issuecomment-681970953


   Gather around everyone. I have got the results. Vectorization has improved performance. 
   ```
   Run 1 - ApproximateHistogram aggregator in query with float input type
   
   Benchmark                                              (numSegments)  (rowsPerSegment)  (schemaAndQuery)  (vectorize)  Mode  Cnt       Score      Error  Units
   TimeseriesBenchmark.queryFilteredSingleQueryableIndex              1            750000           basic.A        false  avgt   15    9427.001 ±  175.655  us/op
   TimeseriesBenchmark.queryFilteredSingleQueryableIndex              1            750000           basic.A         true  avgt   15    9343.698 ±   67.070  us/op
   TimeseriesBenchmark.queryMultiQueryableIndex                       1            750000           basic.A        false  avgt   15   72868.337 ± 6823.255  us/op
   TimeseriesBenchmark.queryMultiQueryableIndex                       1            750000           basic.A         true  avgt   15   27122.215 ±  515.354  us/op
   TimeseriesBenchmark.querySingleQueryableIndex                      1            750000           basic.A        false  avgt   15   70254.501 ± 7000.412  us/op
   TimeseriesBenchmark.querySingleQueryableIndex                      1            750000           basic.A         true  avgt   15   30643.611 ± 4038.082  us/op
   
   
   Run 2 - ApproximateHistogramFolding aggregator in query with ApproximateHistogram input type
   
   
   Benchmark                                              (numSegments)  (rowsPerSegment)  (schemaAndQuery)  (vectorize)  Mode  Cnt       Score       Error  Units
   TimeseriesBenchmark.queryFilteredSingleQueryableIndex              1            750000           basic.A        false  avgt   15    9804.513 ±   268.530  us/op
   TimeseriesBenchmark.queryFilteredSingleQueryableIndex              1            750000           basic.A         true  avgt   15    9495.384 ±    55.576  us/op
   TimeseriesBenchmark.queryMultiQueryableIndex                       1            750000           basic.A        false  avgt   15  175868.359 ± 14097.956  us/op
   TimeseriesBenchmark.queryMultiQueryableIndex                       1            750000           basic.A         true  avgt   15  129778.787 ±  4278.879  us/op
   TimeseriesBenchmark.querySingleQueryableIndex                      1            750000           basic.A        false  avgt   15  147838.235 ±  2350.725  us/op
   TimeseriesBenchmark.querySingleQueryableIndex                      1            750000           basic.A         true  avgt   15  133302.428 ±  2139.185  us/op
   
   
   
   
   Run 3 - FixedBucketHistogram aggregator in query with long input type
   
   
   TimeseriesBenchmark.queryFilteredSingleQueryableIndex              1            750000           basic.A        false  avgt   15   9426.354 ±   94.798  us/op
   TimeseriesBenchmark.queryFilteredSingleQueryableIndex              1            750000           basic.A         true  avgt   15   9354.156 ±   83.472  us/op
   TimeseriesBenchmark.queryMultiQueryableIndex                       1            750000           basic.A        false  avgt   15  72228.744 ± 9069.249  us/op
   TimeseriesBenchmark.queryMultiQueryableIndex                       1            750000           basic.A         true  avgt   15  26900.564 ±  410.899  us/op
   TimeseriesBenchmark.querySingleQueryableIndex                      1            750000           basic.A        false  avgt   15  71854.254 ± 6718.542  us/op
   TimeseriesBenchmark.querySingleQueryableIndex                      1            750000           basic.A         true  avgt   15  26808.244 ±  756.499  us/op
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #10304:
URL: https://github.com/apache/druid/pull/10304#discussion_r481953935



##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramBufferAggregatorInternal.java
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.histogram;
+
+import java.nio.ByteBuffer;
+
+/**
+ * A helper class used by {@link ApproximateHistogramBufferAggregator} and {@link ApproximateHistogramVectorAggregator}
+ * for aggregation operations on byte buffers. Getting the object from value selectors is outside this class.
+ */
+final class ApproximateHistogramBufferAggregatorInternal

Review comment:
       super nitpick, feel free to ignore, but maybe consider naming this (and similar classes) to something like `ApproximateHistogramBufferAggregatorHelper` instead of `ApproximateHistogramBufferAggregatorInternal` to be more consistent with the naming of this style of class with the rest of the codebase. I looked around and this PR has the only classes with an `Internal` suffix but there are many with the `Helper` suffix, and is consistent with the javadoc for this class. 

##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramVectorAggregator.java
##########
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.histogram;
+
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.segment.vector.VectorValueSelector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public class ApproximateHistogramVectorAggregator implements VectorAggregator
+{
+
+  private final VectorValueSelector selector;
+  private final ApproximateHistogramBufferAggregatorInternal innerAggregator;
+
+  public ApproximateHistogramVectorAggregator(
+      VectorValueSelector selector,
+      int resolution
+  )
+  {
+    this.selector = selector;
+    this.innerAggregator = new ApproximateHistogramBufferAggregatorInternal(resolution);
+  }
+
+  @Override
+  public void init(final ByteBuffer buf, final int position)
+  {
+    innerAggregator.init(buf, position);
+  }
+
+  @Override
+  public void aggregate(final ByteBuffer buf, final int position, final int startRow, final int endRow)
+  {
+    final boolean[] isValueNull = selector.getNullVector();
+    final float[] vector = selector.getFloatVector();
+    ApproximateHistogram histogram = innerAggregator.get(buf, position);
+
+    for (int i = startRow; i < endRow; i++) {
+      if (isValueNull != null && isValueNull[i]) {

Review comment:
       Ah yeah i wasn't imagining checking all the conditions in the loop, the `checkNulls` value I was thinking of would be in the loop, similar to `hasNulls` in your example. Thinking further about it though, there is no real need/advantage to checking `NullHandling.sqlCompatible()`.

##########
File path: extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramVectorAggregatorTest.java
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.histogram;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.segment.column.ColumnCapabilities;
+import org.apache.druid.segment.column.ColumnCapabilitiesImpl;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorValueSelector;
+import org.easymock.EasyMock;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.nio.ByteBuffer;
+
+import static org.easymock.EasyMock.createMock;
+import static org.easymock.EasyMock.expect;
+
+public class ApproximateHistogramVectorAggregatorTest

Review comment:
       it isn't obvious from this PR, but out of curiosity are there any tests which confirm that the vectorized aggregator results match the non-vectorized output?

##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramAggregatorFactory.java
##########
@@ -102,6 +105,21 @@ public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
     );
   }
 
+  @Override
+  public VectorAggregator factorizeVector(VectorColumnSelectorFactory metricVectorFactory)
+  {
+    return new ApproximateHistogramVectorAggregator(
+        metricVectorFactory.makeValueSelector(fieldName),
+        resolution
+    );
+  }
+
+  @Override
+  public boolean canVectorize(ColumnInspector columnInspector)
+  {
+    return true;

Review comment:
       We need to handle it _somehow_ because if not it will fail when making the value selector (because there is no string value selector) ```org.apache.druid.query.QueryInterruptedException: Cannot make VectorValueSelector for column with class[org.apache.druid.segment.column.StringDictionaryEncodedColumn]```. This is inconsistent with the non-vectorized behavior, which treats the input as `0` from the dimension selectors.
   
   The ways it can be handled are with either the `canVectorize` method checking explicitly for numeric types, or special handling in `factorizeVector` to use a nil vector selector instead of trying to make a value selector. You probably want similar checks for other agg factories, as is appropriate for the types they handle.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10304:
URL: https://github.com/apache/druid/pull/10304#discussion_r482189014



##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramAggregatorFactory.java
##########
@@ -102,6 +105,21 @@ public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
     );
   }
 
+  @Override
+  public VectorAggregator factorizeVector(VectorColumnSelectorFactory metricVectorFactory)
+  {
+    return new ApproximateHistogramVectorAggregator(
+        metricVectorFactory.makeValueSelector(fieldName),
+        resolution
+    );
+  }
+
+  @Override
+  public boolean canVectorize(ColumnInspector columnInspector)
+  {
+    return true;

Review comment:
       Ack




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10304:
URL: https://github.com/apache/druid/pull/10304#discussion_r478290597



##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramVectorAggregator.java
##########
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.histogram;
+
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.segment.vector.VectorValueSelector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public class ApproximateHistogramVectorAggregator implements VectorAggregator
+{
+
+  private final VectorValueSelector selector;
+  private final ApproximateHistogramBufferAggregatorInternal innerAggregator;
+
+  public ApproximateHistogramVectorAggregator(
+      VectorValueSelector selector,
+      int resolution
+  )
+  {
+    this.selector = selector;
+    this.innerAggregator = new ApproximateHistogramBufferAggregatorInternal(resolution);
+  }
+
+  @Override
+  public void init(final ByteBuffer buf, final int position)
+  {
+    innerAggregator.init(buf, position);
+  }
+
+  @Override
+  public void aggregate(final ByteBuffer buf, final int position, final int startRow, final int endRow)
+  {
+    final boolean[] isValueNull = selector.getNullVector();
+    final float[] vector = selector.getFloatVector();
+    ApproximateHistogram histogram = innerAggregator.get(buf, position);
+
+    for (int i = startRow; i < endRow; i++) {
+      if (isValueNull != null && isValueNull[i]) {

Review comment:
       How about if I take `isValueNull != null` out of loop and line L57 becomes
   
   boolean hasNulls = isValueNull != null;
   ```
   for (int i = startRow; i < endRow; i++) {
      if (hasNulls && isValueNull[i]) {
   ```
   
   This way, I don't need to introduce another predicate in my if condition and number of null checks will still be reduced somewhat. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10304:
URL: https://github.com/apache/druid/pull/10304#discussion_r478294622



##########
File path: extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramVectorAggregatorTest.java
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.histogram;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.segment.column.ColumnCapabilities;
+import org.apache.druid.segment.column.ColumnCapabilitiesImpl;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorValueSelector;
+import org.easymock.EasyMock;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.nio.ByteBuffer;
+
+import static org.easymock.EasyMock.createMock;
+import static org.easymock.EasyMock.expect;
+
+public class ApproximateHistogramVectorAggregatorTest
+{
+  private static final float[] FLOATS = {23, 19, 10, 16, 36, 2, 9, 32, 30, 45, 33};   // Last value is never included
+  private static final boolean[] NULL_VECTOR =
+      {false, false, false, false, false, false, false, false, false, false, true};
+  private VectorColumnSelectorFactory vectorColumnSelectorFactory;
+
+  @Before
+  public void setup()
+  {
+    NullHandling.initializeForTests();
+    VectorValueSelector vectorValueSelector_1 = createMock(VectorValueSelector.class);
+    expect(vectorValueSelector_1.getFloatVector()).andReturn(FLOATS).anyTimes();
+    expect(vectorValueSelector_1.getNullVector()).andReturn(NULL_VECTOR).anyTimes();
+
+    VectorValueSelector vectorValueSelector_2 = createMock(VectorValueSelector.class);
+    expect(vectorValueSelector_2.getFloatVector()).andReturn(FLOATS).anyTimes();
+    expect(vectorValueSelector_2.getNullVector()).andReturn(null).anyTimes();
+
+    EasyMock.replay(vectorValueSelector_1);
+    EasyMock.replay(vectorValueSelector_2);
+
+    ColumnCapabilities columnCapabilities
+        = new ColumnCapabilitiesImpl().setType(ValueType.DOUBLE).setDictionaryEncoded(true);

Review comment:
       Ack




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #10304: WIP: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on pull request #10304:
URL: https://github.com/apache/druid/pull/10304#issuecomment-677723556


   This pull request **fixes 1 alert** when merging 55401cde3715db3baec2d617cbcef7131713fee2 into b36dab0fe65557e4aaf675423d61bdaa51501a71 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-f7d906ed2de4ea503e82cd3e9e84842531b0b843)
   
   **fixed alerts:**
   
   * 1 for Boxed variable is never null


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10304:
URL: https://github.com/apache/druid/pull/10304#discussion_r478863295



##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramFoldingBufferAggregatorInternal.java
##########
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.histogram;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+/**
+ * A helper class used by {@link ApproximateHistogramFoldingBufferAggregator} and
+ * {@link ApproximateHistogramFoldingVectorAggregator} for aggregation operations on byte buffers.
+ * Getting the object from value selectors is outside this class.
+ */
+final class ApproximateHistogramFoldingBufferAggregatorInternal
+{
+  private final int resolution;
+  private final float upperLimit;
+  private final float lowerLimit;
+
+  private float[] tmpBufferA;
+  private long[] tmpBufferB;
+
+  public ApproximateHistogramFoldingBufferAggregatorInternal(
+      int resolution,
+      float lowerLimit,
+      float upperLimit
+  )
+  {
+    this.resolution = resolution;
+    this.lowerLimit = lowerLimit;
+    this.upperLimit = upperLimit;
+
+    tmpBufferA = new float[resolution];
+    tmpBufferB = new long[resolution];
+  }
+
+  public void init(ByteBuffer buf, int position)
+  {
+    ApproximateHistogram h = new ApproximateHistogram(resolution, lowerLimit, upperLimit);
+
+    ByteBuffer mutationBuffer = buf.duplicate();
+    mutationBuffer.position(position);
+    // use dense storage for aggregation
+    h.toBytesDense(mutationBuffer);
+  }
+
+  public void aggregate(ByteBuffer buf, int position, @Nullable ApproximateHistogram hNext)
+  {
+    if (hNext == null) {
+      return;
+    }
+    ByteBuffer mutationBuffer = buf.duplicate();
+    mutationBuffer.position(position);
+
+    ApproximateHistogram h0 = ApproximateHistogram.fromBytesDense(mutationBuffer);
+    foldFast(h0, hNext);
+
+    mutationBuffer.position(position);
+    h0.toBytesDense(mutationBuffer);
+  }
+
+  public void foldFast(ApproximateHistogram left, ApproximateHistogram right)
+  {
+    //TODO: do these have to set in every call
+    left.setLowerLimit(lowerLimit);

Review comment:
       Thanks for the explanation. I didn't realize that these limits are transient.  I will add a similar comment here for future reference. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10304:
URL: https://github.com/apache/druid/pull/10304#discussion_r474782276



##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramFoldingBufferAggregatorInternal.java
##########
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.histogram;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+/**
+ * A helper class used by {@link ApproximateHistogramFoldingBufferAggregator} and
+ * {@link ApproximateHistogramFoldingVectorAggregator} for aggregation operations on byte buffers.
+ * Getting the object from value selectors is outside this class.
+ */
+final class ApproximateHistogramFoldingBufferAggregatorInternal
+{
+  private final int resolution;
+  private final float upperLimit;
+  private final float lowerLimit;
+
+  private float[] tmpBufferA;
+  private long[] tmpBufferB;
+
+  public ApproximateHistogramFoldingBufferAggregatorInternal(
+      int resolution,
+      float lowerLimit,
+      float upperLimit
+  )
+  {
+    this.resolution = resolution;
+    this.lowerLimit = lowerLimit;
+    this.upperLimit = upperLimit;
+
+    tmpBufferA = new float[resolution];
+    tmpBufferB = new long[resolution];
+  }
+
+  public void init(ByteBuffer buf, int position)
+  {
+    ApproximateHistogram h = new ApproximateHistogram(resolution, lowerLimit, upperLimit);
+
+    ByteBuffer mutationBuffer = buf.duplicate();
+    mutationBuffer.position(position);
+    // use dense storage for aggregation
+    h.toBytesDense(mutationBuffer);
+  }
+
+  public void aggregate(ByteBuffer buf, int position, @Nullable ApproximateHistogram hNext)
+  {
+    if (hNext == null) {
+      return;
+    }
+    ByteBuffer mutationBuffer = buf.duplicate();
+    mutationBuffer.position(position);
+
+    ApproximateHistogram h0 = ApproximateHistogram.fromBytesDense(mutationBuffer);
+    foldFast(h0, hNext);
+
+    mutationBuffer.position(position);
+    h0.toBytesDense(mutationBuffer);
+  }
+
+  public void foldFast(ApproximateHistogram left, ApproximateHistogram right)
+  {
+    //TODO: do these have to set in every call
+    left.setLowerLimit(lowerLimit);
+    left.setUpperLimit(upperLimit);
+    left.foldFast(right, tmpBufferA, tmpBufferB);

Review comment:
       This is a copy of old implementation. However, I noticed that `ApproximateHistogramAggregator` has the following implementation instead which looks more correct. calling `foldFast` with inadequate space results in an exception. 
   ```
    if (left.binCount() + right.binCount() <= tmpBufferB.length) {
         left.foldFast(right, tmpBufferA, tmpBufferB);
       } else {
         left.foldFast(right);
       }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on pull request #10304:
URL: https://github.com/apache/druid/pull/10304#issuecomment-686463253


   This pull request **fixes 1 alert** when merging adfb1353cb2b0fb7c8f52232f5c14e06b74f3496 into 3fc8bc0701938b532282b6b50398c0ee6503a517 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-9362d02d2d54776b3dbfa9851741fdcc7e6a79da)
   
   **fixed alerts:**
   
   * 1 for Boxed variable is never null


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10304:
URL: https://github.com/apache/druid/pull/10304#discussion_r478280502



##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramBufferAggregator.java
##########
@@ -28,54 +28,30 @@
 public class ApproximateHistogramBufferAggregator implements BufferAggregator
 {
   private final BaseFloatColumnValueSelector selector;
-  private final int resolution;
+  private final ApproximateHistogramBufferAggregatorInternal innerAggregator;

Review comment:
       Yes but decided against it as static methods are not so good when it comes to unit testing. I can pass mock dependencies in the constructor, unlike the static methods. Then, some classes also have temporary buffers as a state which I can put inside the instances with common functionality. E.g. `ApproximateHistogramFoldingBufferAggregatorInternal`  has temporary buffers created just once.  I could make these temp buffers static too but then synchronization issues kick in. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #10304:
URL: https://github.com/apache/druid/pull/10304#discussion_r478793614



##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramFoldingBufferAggregatorInternal.java
##########
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.histogram;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+/**
+ * A helper class used by {@link ApproximateHistogramFoldingBufferAggregator} and
+ * {@link ApproximateHistogramFoldingVectorAggregator} for aggregation operations on byte buffers.
+ * Getting the object from value selectors is outside this class.
+ */
+final class ApproximateHistogramFoldingBufferAggregatorInternal
+{
+  private final int resolution;
+  private final float upperLimit;
+  private final float lowerLimit;
+
+  private float[] tmpBufferA;
+  private long[] tmpBufferB;
+
+  public ApproximateHistogramFoldingBufferAggregatorInternal(
+      int resolution,
+      float lowerLimit,
+      float upperLimit
+  )
+  {
+    this.resolution = resolution;
+    this.lowerLimit = lowerLimit;
+    this.upperLimit = upperLimit;
+
+    tmpBufferA = new float[resolution];
+    tmpBufferB = new long[resolution];
+  }
+
+  public void init(ByteBuffer buf, int position)
+  {
+    ApproximateHistogram h = new ApproximateHistogram(resolution, lowerLimit, upperLimit);
+
+    ByteBuffer mutationBuffer = buf.duplicate();
+    mutationBuffer.position(position);
+    // use dense storage for aggregation
+    h.toBytesDense(mutationBuffer);
+  }
+
+  public void aggregate(ByteBuffer buf, int position, @Nullable ApproximateHistogram hNext)
+  {
+    if (hNext == null) {
+      return;
+    }
+    ByteBuffer mutationBuffer = buf.duplicate();
+    mutationBuffer.position(position);
+
+    ApproximateHistogram h0 = ApproximateHistogram.fromBytesDense(mutationBuffer);
+    foldFast(h0, hNext);
+
+    mutationBuffer.position(position);
+    h0.toBytesDense(mutationBuffer);
+  }
+
+  public void foldFast(ApproximateHistogram left, ApproximateHistogram right)
+  {
+    //TODO: do these have to set in every call
+    left.setLowerLimit(lowerLimit);
+    left.setUpperLimit(upperLimit);
+    left.foldFast(right, tmpBufferA, tmpBufferB);

Review comment:
       Hm, what was the exception you saw? 
   
   From looking at the code it seems like the buffers allocated in `ApproximateHistogram.foldRule` when `foldFast` is called with a single argument should be the same size as `tmpBufferA` and `tmpBufferB`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10304: WIP: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10304:
URL: https://github.com/apache/druid/pull/10304#discussion_r474040705



##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/FixedBucketsHistogramBufferAggregatorInternal.java
##########
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.histogram;
+
+import org.apache.druid.common.config.NullHandling;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public class FixedBucketsHistogramBufferAggregatorInternal
+{
+  private final double lowerLimit;
+  private final double upperLimit;
+  private final int numBuckets;
+  private final FixedBucketsHistogram.OutlierHandlingMode outlierHandlingMode;
+
+  public FixedBucketsHistogramBufferAggregatorInternal(
+      double lowerLimit,
+      double upperLimit,
+      int numBuckets,
+      FixedBucketsHistogram.OutlierHandlingMode outlierHandlingMode
+  )
+  {
+
+    this.lowerLimit = lowerLimit;
+    this.upperLimit = upperLimit;
+    this.numBuckets = numBuckets;
+    this.outlierHandlingMode = outlierHandlingMode;
+  }
+
+  public void init(ByteBuffer buf, int position)
+  {
+    ByteBuffer mutationBuffer = buf.duplicate();
+    mutationBuffer.position(position);
+    FixedBucketsHistogram histogram = new FixedBucketsHistogram(
+        lowerLimit,
+        upperLimit,
+        numBuckets,
+        outlierHandlingMode
+    );
+    mutationBuffer.put(histogram.toBytesFull(false));
+  }
+
+  public void aggregate(ByteBuffer buf, int position, @Nullable Object val)
+  {
+    ByteBuffer mutationBuffer = buf.duplicate();
+    mutationBuffer.position(position);
+
+    FixedBucketsHistogram h0 = FixedBucketsHistogram.fromByteBufferFullNoSerdeHeader(mutationBuffer);
+    combine(h0, val);
+
+    mutationBuffer.position(position);
+    mutationBuffer.put(h0.toBytesFull(false));
+  }
+
+  public void combine(FixedBucketsHistogram histogram, @Nullable Object next)
+  {
+    if (next == null) {
+      if (NullHandling.replaceWithDefault()) {

Review comment:
       I might have carried forward a bug here. This if/else should most likely be inverted. cc @jon-wei 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on pull request #10304:
URL: https://github.com/apache/druid/pull/10304#issuecomment-682098260


   This pull request **fixes 1 alert** when merging 19b2b72f061c0373dc7910e201999a62b557b2c5 into f82fd22fa7de175200b7127c34c2eb2900bf7317 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-2ac41f6a54a170a086bf58d329da4060fba9e54f)
   
   **fixed alerts:**
   
   * 1 for Boxed variable is never null


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on pull request #10304:
URL: https://github.com/apache/druid/pull/10304#issuecomment-678382987


   This pull request **fixes 1 alert** when merging b47c90634c746d66812736b6202a976ea7dc7cb2 into 7620b0c54e31ed466adf149b2b7fbd815c4c70ce - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-0f17ccdf61a692aa391400c74abe632a1e6ab87f)
   
   **fixed alerts:**
   
   * 1 for Boxed variable is never null


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10304:
URL: https://github.com/apache/druid/pull/10304#discussion_r482189323



##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramBufferAggregatorInternal.java
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.histogram;
+
+import java.nio.ByteBuffer;
+
+/**
+ * A helper class used by {@link ApproximateHistogramBufferAggregator} and {@link ApproximateHistogramVectorAggregator}
+ * for aggregation operations on byte buffers. Getting the object from value selectors is outside this class.
+ */
+final class ApproximateHistogramBufferAggregatorInternal

Review comment:
       Its good to be consistent. I will rename these classes. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10304:
URL: https://github.com/apache/druid/pull/10304#discussion_r478283023



##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/FixedBucketsHistogramAggregatorFactory.java
##########
@@ -99,6 +105,34 @@ public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
     );
   }
 
+  @Override
+  public VectorAggregator factorizeVector(VectorColumnSelectorFactory columnSelectorFactory)
+  {
+    ColumnCapabilities capabilities = columnSelectorFactory.getColumnCapabilities(fieldName);
+    if (null == capabilities) {

Review comment:
       I have seen this trend in a few places where an extra guard is put up. That the caller may not call `canVectorize` before `factorizeVector`.  I can remove it ifs unnecessarily defensive. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on pull request #10304:
URL: https://github.com/apache/druid/pull/10304#issuecomment-686417266


   This pull request **fixes 1 alert** when merging 09ace2d0b9aaf40082891b7f34e2b4420a3c8a3d into 3fc8bc0701938b532282b6b50398c0ee6503a517 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-e39eb50b1f0007908f67d6051ebc5b743ef9c359)
   
   **fixed alerts:**
   
   * 1 for Boxed variable is never null


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10304:
URL: https://github.com/apache/druid/pull/10304#discussion_r482190591



##########
File path: extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramVectorAggregatorTest.java
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.histogram;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.segment.column.ColumnCapabilities;
+import org.apache.druid.segment.column.ColumnCapabilitiesImpl;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorValueSelector;
+import org.easymock.EasyMock;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.nio.ByteBuffer;
+
+import static org.easymock.EasyMock.createMock;
+import static org.easymock.EasyMock.expect;
+
+public class ApproximateHistogramVectorAggregatorTest

Review comment:
       There isn't a test that runs on both non-vectorized and vectorized at the same time. Though the input/output used in vector aggregator tests is almost same as what is used in tests for non-vector aggregator. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on pull request #10304:
URL: https://github.com/apache/druid/pull/10304#issuecomment-682391980


   This pull request **fixes 1 alert** when merging db59ddd104cd2fed2e82db5780bf9d54d4a6b188 into f82fd22fa7de175200b7127c34c2eb2900bf7317 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-3aeca08d3357a6a6788fc9e545ef653a4d7eb119)
   
   **fixed alerts:**
   
   * 1 for Boxed variable is never null


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on pull request #10304:
URL: https://github.com/apache/druid/pull/10304#issuecomment-682444755


   This pull request **fixes 1 alert** when merging 182b6102d774eddb1ef06383ae79c1c30c9df115 into f82fd22fa7de175200b7127c34c2eb2900bf7317 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-ca77626dfefb2ba9fec39a675f1e44179a66749b)
   
   **fixed alerts:**
   
   * 1 for Boxed variable is never null


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10304:
URL: https://github.com/apache/druid/pull/10304#discussion_r474780398



##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramFoldingBufferAggregatorInternal.java
##########
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.histogram;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+/**
+ * A helper class used by {@link ApproximateHistogramFoldingBufferAggregator} and
+ * {@link ApproximateHistogramFoldingVectorAggregator} for aggregation operations on byte buffers.
+ * Getting the object from value selectors is outside this class.
+ */
+final class ApproximateHistogramFoldingBufferAggregatorInternal
+{
+  private final int resolution;
+  private final float upperLimit;
+  private final float lowerLimit;
+
+  private float[] tmpBufferA;
+  private long[] tmpBufferB;
+
+  public ApproximateHistogramFoldingBufferAggregatorInternal(
+      int resolution,
+      float lowerLimit,
+      float upperLimit
+  )
+  {
+    this.resolution = resolution;
+    this.lowerLimit = lowerLimit;
+    this.upperLimit = upperLimit;
+
+    tmpBufferA = new float[resolution];
+    tmpBufferB = new long[resolution];
+  }
+
+  public void init(ByteBuffer buf, int position)
+  {
+    ApproximateHistogram h = new ApproximateHistogram(resolution, lowerLimit, upperLimit);
+
+    ByteBuffer mutationBuffer = buf.duplicate();
+    mutationBuffer.position(position);
+    // use dense storage for aggregation
+    h.toBytesDense(mutationBuffer);
+  }
+
+  public void aggregate(ByteBuffer buf, int position, @Nullable ApproximateHistogram hNext)
+  {
+    if (hNext == null) {
+      return;
+    }
+    ByteBuffer mutationBuffer = buf.duplicate();
+    mutationBuffer.position(position);
+
+    ApproximateHistogram h0 = ApproximateHistogram.fromBytesDense(mutationBuffer);
+    foldFast(h0, hNext);
+
+    mutationBuffer.position(position);
+    h0.toBytesDense(mutationBuffer);
+  }
+
+  public void foldFast(ApproximateHistogram left, ApproximateHistogram right)
+  {
+    //TODO: do these have to set in every call
+    left.setLowerLimit(lowerLimit);

Review comment:
       this looks unnecessary




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10304:
URL: https://github.com/apache/druid/pull/10304#discussion_r478327250



##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramAggregatorFactory.java
##########
@@ -102,6 +105,21 @@ public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
     );
   }
 
+  @Override
+  public VectorAggregator factorizeVector(VectorColumnSelectorFactory metricVectorFactory)
+  {
+    return new ApproximateHistogramVectorAggregator(
+        metricVectorFactory.makeValueSelector(fieldName),
+        resolution
+    );
+  }
+
+  @Override
+  public boolean canVectorize(ColumnInspector columnInspector)
+  {
+    return true;

Review comment:
       The approximate histogram aggregators do not handle the strings. The way I see it, `canVectorize` only indicates when the aggregation cannot be vectorized. E.g. fixed bucket aggregator factory can aggregate complex objects but not with vectorization and this is what `canVectorize` in fixedBucket**Factory checks. 
   
   In this class, any type that can be aggregated in regular aggregators, is supported by vector aggregator as well. Any type that is not supported by regular aggregator is not supported by vector aggregator as well. Hence the method `canVectorize` just returns true. 
   
   do you think it would still make sense to check for input type? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10304:
URL: https://github.com/apache/druid/pull/10304#discussion_r478865615



##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramFoldingBufferAggregatorInternal.java
##########
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.histogram;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+/**
+ * A helper class used by {@link ApproximateHistogramFoldingBufferAggregator} and
+ * {@link ApproximateHistogramFoldingVectorAggregator} for aggregation operations on byte buffers.
+ * Getting the object from value selectors is outside this class.
+ */
+final class ApproximateHistogramFoldingBufferAggregatorInternal
+{
+  private final int resolution;
+  private final float upperLimit;
+  private final float lowerLimit;
+
+  private float[] tmpBufferA;
+  private long[] tmpBufferB;
+
+  public ApproximateHistogramFoldingBufferAggregatorInternal(
+      int resolution,
+      float lowerLimit,
+      float upperLimit
+  )
+  {
+    this.resolution = resolution;
+    this.lowerLimit = lowerLimit;
+    this.upperLimit = upperLimit;
+
+    tmpBufferA = new float[resolution];
+    tmpBufferB = new long[resolution];
+  }
+
+  public void init(ByteBuffer buf, int position)
+  {
+    ApproximateHistogram h = new ApproximateHistogram(resolution, lowerLimit, upperLimit);
+
+    ByteBuffer mutationBuffer = buf.duplicate();
+    mutationBuffer.position(position);
+    // use dense storage for aggregation
+    h.toBytesDense(mutationBuffer);
+  }
+
+  public void aggregate(ByteBuffer buf, int position, @Nullable ApproximateHistogram hNext)
+  {
+    if (hNext == null) {
+      return;
+    }
+    ByteBuffer mutationBuffer = buf.duplicate();
+    mutationBuffer.position(position);
+
+    ApproximateHistogram h0 = ApproximateHistogram.fromBytesDense(mutationBuffer);
+    foldFast(h0, hNext);
+
+    mutationBuffer.position(position);
+    h0.toBytesDense(mutationBuffer);
+  }
+
+  public void foldFast(ApproximateHistogram left, ApproximateHistogram right)
+  {
+    //TODO: do these have to set in every call
+    left.setLowerLimit(lowerLimit);
+    left.setUpperLimit(upperLimit);
+    left.foldFast(right, tmpBufferA, tmpBufferB);

Review comment:
       can't reproduce it now. I am not sure either why `ApproximateHistogramAggregator` will have a different implementation. Will leave it as it is. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10304:
URL: https://github.com/apache/druid/pull/10304#discussion_r478281326



##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramFoldingBufferAggregatorInternal.java
##########
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.histogram;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+/**
+ * A helper class used by {@link ApproximateHistogramFoldingBufferAggregator} and
+ * {@link ApproximateHistogramFoldingVectorAggregator} for aggregation operations on byte buffers.
+ * Getting the object from value selectors is outside this class.
+ */
+final class ApproximateHistogramFoldingBufferAggregatorInternal
+{
+  private final int resolution;
+  private final float upperLimit;
+  private final float lowerLimit;
+
+  private float[] tmpBufferA;
+  private long[] tmpBufferB;
+
+  public ApproximateHistogramFoldingBufferAggregatorInternal(
+      int resolution,
+      float lowerLimit,
+      float upperLimit
+  )
+  {
+    this.resolution = resolution;
+    this.lowerLimit = lowerLimit;
+    this.upperLimit = upperLimit;
+
+    tmpBufferA = new float[resolution];
+    tmpBufferB = new long[resolution];
+  }
+
+  public void init(ByteBuffer buf, int position)
+  {
+    ApproximateHistogram h = new ApproximateHistogram(resolution, lowerLimit, upperLimit);
+
+    ByteBuffer mutationBuffer = buf.duplicate();
+    mutationBuffer.position(position);
+    // use dense storage for aggregation
+    h.toBytesDense(mutationBuffer);
+  }
+
+  public void aggregate(ByteBuffer buf, int position, @Nullable ApproximateHistogram hNext)
+  {
+    if (hNext == null) {
+      return;
+    }
+    ByteBuffer mutationBuffer = buf.duplicate();
+    mutationBuffer.position(position);
+
+    ApproximateHistogram h0 = ApproximateHistogram.fromBytesDense(mutationBuffer);
+    foldFast(h0, hNext);
+
+    mutationBuffer.position(position);
+    h0.toBytesDense(mutationBuffer);
+  }
+
+  public void foldFast(ApproximateHistogram left, ApproximateHistogram right)
+  {
+    //TODO: do these have to set in every call
+    left.setLowerLimit(lowerLimit);

Review comment:
       Ack




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #10304:
URL: https://github.com/apache/druid/pull/10304#discussion_r477198847



##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramBufferAggregator.java
##########
@@ -28,54 +28,30 @@
 public class ApproximateHistogramBufferAggregator implements BufferAggregator
 {
   private final BaseFloatColumnValueSelector selector;
-  private final int resolution;
+  private final ApproximateHistogramBufferAggregatorInternal innerAggregator;

Review comment:
       Did you consider making the shared functionality just be static methods to be more consistent with how `HyperUniquesBufferAggregator` and `HyperUniquesVectorAggregator` are implemented? This is totally nitpicking, but something just seems off about these things having a thing called 'innerAggregator' that doesn't implement any of the aggregator interfaces.

##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/FixedBucketsHistogram.java
##########
@@ -431,6 +433,33 @@ public void incrementMissing()
     }
   }
 
+  /**
+   * Merge another datapoint into this one. The other datapoin could be

Review comment:
       typo: 'datapoin' -> 'datapoint'

##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/FixedBucketsHistogramAggregatorFactory.java
##########
@@ -99,6 +105,34 @@ public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
     );
   }
 
+  @Override
+  public VectorAggregator factorizeVector(VectorColumnSelectorFactory columnSelectorFactory)
+  {
+    ColumnCapabilities capabilities = columnSelectorFactory.getColumnCapabilities(fieldName);
+    if (null == capabilities) {

Review comment:
       I think this isn't possible since `canVectorize` checks that capabilities isn't null

##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/FixedBucketsHistogramAggregatorFactory.java
##########
@@ -99,6 +105,34 @@ public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
     );
   }
 
+  @Override
+  public VectorAggregator factorizeVector(VectorColumnSelectorFactory columnSelectorFactory)
+  {
+    ColumnCapabilities capabilities = columnSelectorFactory.getColumnCapabilities(fieldName);
+    if (null == capabilities) {
+      throw new IAE("could not find the column type for column %s", fieldName);
+    }
+    ValueType type = capabilities.getType();
+    if (type.isNumeric()) {
+      return new FixedBucketsHistogramVectorAggregator(
+          columnSelectorFactory.makeValueSelector(fieldName),
+          lowerLimit,
+          upperLimit,
+          numBuckets,
+          outlierHandlingMode
+      );
+    } else {
+      throw new IAE("cannot vectorize fixed bucket histogram aggregation for type %s", type);
+    }
+  }
+
+  @Override
+  public boolean canVectorize(ColumnInspector columnInspector)
+  {
+    ColumnCapabilities capabilities = columnInspector.getColumnCapabilities(fieldName);
+    return (capabilities != null) && capabilities.getType().isNumeric();

Review comment:
       Did you mean to only handle numeric primitive inputs? The input type could also be complex if you handle fixed bucket histogram inputs, but you would need another vector aggregator implementation I think that takes an object selector instead of value selector

##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramVectorAggregator.java
##########
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.histogram;
+
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.segment.vector.VectorValueSelector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public class ApproximateHistogramVectorAggregator implements VectorAggregator
+{
+
+  private final VectorValueSelector selector;
+  private final ApproximateHistogramBufferAggregatorInternal innerAggregator;
+
+  public ApproximateHistogramVectorAggregator(
+      VectorValueSelector selector,
+      int resolution
+  )
+  {
+    this.selector = selector;
+    this.innerAggregator = new ApproximateHistogramBufferAggregatorInternal(resolution);
+  }
+
+  @Override
+  public void init(final ByteBuffer buf, final int position)
+  {
+    innerAggregator.init(buf, position);
+  }
+
+  @Override
+  public void aggregate(final ByteBuffer buf, final int position, final int startRow, final int endRow)
+  {
+    final boolean[] isValueNull = selector.getNullVector();
+    final float[] vector = selector.getFloatVector();
+    ApproximateHistogram histogram = innerAggregator.get(buf, position);
+
+    for (int i = startRow; i < endRow; i++) {
+      if (isValueNull != null && isValueNull[i]) {

Review comment:
       you can also ignore null checks entirely if `NullHandling.sqlCompatible()` is `true`, would suggest saving it as a private final field in the constructor and then maybe add something like `final boolean checkNulls = hasNulls && isValueNull != null`

##########
File path: extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramVectorAggregatorTest.java
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.histogram;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.segment.column.ColumnCapabilities;
+import org.apache.druid.segment.column.ColumnCapabilitiesImpl;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorValueSelector;
+import org.easymock.EasyMock;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.nio.ByteBuffer;
+
+import static org.easymock.EasyMock.createMock;
+import static org.easymock.EasyMock.expect;
+
+public class ApproximateHistogramVectorAggregatorTest
+{
+  private static final float[] FLOATS = {23, 19, 10, 16, 36, 2, 9, 32, 30, 45, 33};   // Last value is never included
+  private static final boolean[] NULL_VECTOR =
+      {false, false, false, false, false, false, false, false, false, false, true};
+  private VectorColumnSelectorFactory vectorColumnSelectorFactory;
+
+  @Before
+  public void setup()
+  {
+    NullHandling.initializeForTests();
+    VectorValueSelector vectorValueSelector_1 = createMock(VectorValueSelector.class);
+    expect(vectorValueSelector_1.getFloatVector()).andReturn(FLOATS).anyTimes();
+    expect(vectorValueSelector_1.getNullVector()).andReturn(NULL_VECTOR).anyTimes();
+
+    VectorValueSelector vectorValueSelector_2 = createMock(VectorValueSelector.class);
+    expect(vectorValueSelector_2.getFloatVector()).andReturn(FLOATS).anyTimes();
+    expect(vectorValueSelector_2.getNullVector()).andReturn(null).anyTimes();
+
+    EasyMock.replay(vectorValueSelector_1);
+    EasyMock.replay(vectorValueSelector_2);
+
+    ColumnCapabilities columnCapabilities
+        = new ColumnCapabilitiesImpl().setType(ValueType.DOUBLE).setDictionaryEncoded(true);

Review comment:
       nit: suggest `ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.DOUBLE)` since it will create realistic double capabilities (numbers are not dictionary encoded for example)

##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/FixedBucketsHistogram.java
##########
@@ -431,6 +433,33 @@ public void incrementMissing()
     }
   }
 
+  /**
+   * Merge another datapoint into this one. The other datapoin could be
+   *  - base64 encoded string of {@code FixedBucketsHistogram}
+   *  - {@code FixedBucketsHistogram} object
+   *  - Numeric value
+   *
+   * @param val
+   */
+  void combine(@Nullable Object val)

Review comment:
       I'm not sure this function should be shared between the vectorized and non-vectorized aggregator. For the vector aggregator the `if` should probably be outside of the `for` loop i think, because the contents of the vector will be consistent throughout the loop.
   
   Also, I think you might need different selectors depending on if the inputs to the aggregator are numeric primitives (value selector to get double vector and null boolean vector), or if the input is other fixed bucket histogram sketches (object selector to get array of histogram objects). The fixed bucket histogram aggregator is a combined primitive and sketch merging aggregator, unlike the approximate histogram aggregators which are split and handles the sketch inputs and result merges with the 'fold' aggregators.

##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramFoldingBufferAggregatorInternal.java
##########
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.histogram;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+/**
+ * A helper class used by {@link ApproximateHistogramFoldingBufferAggregator} and
+ * {@link ApproximateHistogramFoldingVectorAggregator} for aggregation operations on byte buffers.
+ * Getting the object from value selectors is outside this class.
+ */
+final class ApproximateHistogramFoldingBufferAggregatorInternal
+{
+  private final int resolution;
+  private final float upperLimit;
+  private final float lowerLimit;
+
+  private float[] tmpBufferA;
+  private long[] tmpBufferB;
+
+  public ApproximateHistogramFoldingBufferAggregatorInternal(
+      int resolution,
+      float lowerLimit,
+      float upperLimit
+  )
+  {
+    this.resolution = resolution;
+    this.lowerLimit = lowerLimit;
+    this.upperLimit = upperLimit;
+
+    tmpBufferA = new float[resolution];
+    tmpBufferB = new long[resolution];
+  }
+
+  public void init(ByteBuffer buf, int position)
+  {
+    ApproximateHistogram h = new ApproximateHistogram(resolution, lowerLimit, upperLimit);
+
+    ByteBuffer mutationBuffer = buf.duplicate();
+    mutationBuffer.position(position);
+    // use dense storage for aggregation
+    h.toBytesDense(mutationBuffer);
+  }
+
+  public void aggregate(ByteBuffer buf, int position, @Nullable ApproximateHistogram hNext)
+  {
+    if (hNext == null) {
+      return;
+    }
+    ByteBuffer mutationBuffer = buf.duplicate();
+    mutationBuffer.position(position);
+
+    ApproximateHistogram h0 = ApproximateHistogram.fromBytesDense(mutationBuffer);
+    foldFast(h0, hNext);
+
+    mutationBuffer.position(position);
+    h0.toBytesDense(mutationBuffer);
+  }
+
+  public void foldFast(ApproximateHistogram left, ApproximateHistogram right)
+  {
+    //TODO: do these have to set in every call
+    left.setLowerLimit(lowerLimit);

Review comment:
       A quick look and I think I agree, but am not totally certain. Can you try to find out if it is needed so we can remove this TODO and either remove the code, or add a comment on why it needs to be here?

##########
File path: extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/FixedBucketsHistogramVectorAggregatorTest.java
##########
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.histogram;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.segment.column.ColumnCapabilities;
+import org.apache.druid.segment.column.ColumnCapabilitiesImpl;
+import org.apache.druid.segment.column.ValueType;
+import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorValueSelector;
+import org.easymock.EasyMock;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.nio.ByteBuffer;
+
+import static org.easymock.EasyMock.createMock;
+import static org.easymock.EasyMock.expect;
+
+public class FixedBucketsHistogramVectorAggregatorTest
+{
+  private static final double[] DOUBLES = {1.0, 12.0, 3.0, 14.0, 15.0, 16.0};
+  private static final boolean[] NULL_VECTOR = {false, false, false, false, true, false};
+  private VectorColumnSelectorFactory vectorColumnSelectorFactory;
+
+  @Before
+  public void setup()
+  {
+    NullHandling.initializeForTests();
+    VectorValueSelector vectorValueSelector_1 = createMock(VectorValueSelector.class);
+    expect(vectorValueSelector_1.getDoubleVector()).andReturn(DOUBLES).anyTimes();
+    expect(vectorValueSelector_1.getNullVector()).andReturn(NULL_VECTOR).anyTimes();
+
+    VectorValueSelector vectorValueSelector_2 = createMock(VectorValueSelector.class);
+    expect(vectorValueSelector_2.getDoubleVector()).andReturn(DOUBLES).anyTimes();
+    expect(vectorValueSelector_2.getNullVector()).andReturn(null).anyTimes();
+
+    EasyMock.replay(vectorValueSelector_1);
+    EasyMock.replay(vectorValueSelector_2);
+
+    ColumnCapabilities columnCapabilities
+        = new ColumnCapabilitiesImpl().setType(ValueType.DOUBLE).setDictionaryEncoded(true);

Review comment:
       nit: same comment about capabilities

##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramAggregatorFactory.java
##########
@@ -102,6 +105,21 @@ public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
     );
   }
 
+  @Override
+  public VectorAggregator factorizeVector(VectorColumnSelectorFactory metricVectorFactory)
+  {
+    return new ApproximateHistogramVectorAggregator(
+        metricVectorFactory.makeValueSelector(fieldName),
+        resolution
+    );
+  }
+
+  @Override
+  public boolean canVectorize(ColumnInspector columnInspector)
+  {
+    return true;

Review comment:
       should this check if the column is numeric or complex similar to the fixed buckets aggregator factory? I don't think we have a good way for aggregators to handle string inputs in vectorized engine yet either, unless you use `SingleValueDimensionVectorSelector` or `MultiValueDimensionVectorSelector` and lookup the string values for the int arrays yourself, so should probably exclude strings at least (not that they make much sense as an input anyway).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on pull request #10304:
URL: https://github.com/apache/druid/pull/10304#issuecomment-680045262


   This pull request **fixes 1 alert** when merging 81e72a51acadeebf324ee133f22b15ce5d63e5d4 into f53785c52cef372dd5c456260329dfd5e39011e2 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-daf38292670581d94db2b2601f6c0019f24ebe48)
   
   **fixed alerts:**
   
   * 1 for Boxed variable is never null


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10304:
URL: https://github.com/apache/druid/pull/10304#discussion_r478293295



##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/FixedBucketsHistogram.java
##########
@@ -431,6 +433,33 @@ public void incrementMissing()
     }
   }
 
+  /**
+   * Merge another datapoint into this one. The other datapoin could be
+   *  - base64 encoded string of {@code FixedBucketsHistogram}
+   *  - {@code FixedBucketsHistogram} object
+   *  - Numeric value
+   *
+   * @param val
+   */
+  void combine(@Nullable Object val)

Review comment:
       Good point. I will let this method remain here. Since I am only tackling numeric values for now, my vector implementation can call `add` directly on fixed histogram. will make that change. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #10304: Add vectorization for druid-histogram extension

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #10304:
URL: https://github.com/apache/druid/pull/10304#discussion_r478782925



##########
File path: extensions-core/histogram/src/main/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramFoldingBufferAggregatorInternal.java
##########
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.histogram;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+/**
+ * A helper class used by {@link ApproximateHistogramFoldingBufferAggregator} and
+ * {@link ApproximateHistogramFoldingVectorAggregator} for aggregation operations on byte buffers.
+ * Getting the object from value selectors is outside this class.
+ */
+final class ApproximateHistogramFoldingBufferAggregatorInternal
+{
+  private final int resolution;
+  private final float upperLimit;
+  private final float lowerLimit;
+
+  private float[] tmpBufferA;
+  private long[] tmpBufferB;
+
+  public ApproximateHistogramFoldingBufferAggregatorInternal(
+      int resolution,
+      float lowerLimit,
+      float upperLimit
+  )
+  {
+    this.resolution = resolution;
+    this.lowerLimit = lowerLimit;
+    this.upperLimit = upperLimit;
+
+    tmpBufferA = new float[resolution];
+    tmpBufferB = new long[resolution];
+  }
+
+  public void init(ByteBuffer buf, int position)
+  {
+    ApproximateHistogram h = new ApproximateHistogram(resolution, lowerLimit, upperLimit);
+
+    ByteBuffer mutationBuffer = buf.duplicate();
+    mutationBuffer.position(position);
+    // use dense storage for aggregation
+    h.toBytesDense(mutationBuffer);
+  }
+
+  public void aggregate(ByteBuffer buf, int position, @Nullable ApproximateHistogram hNext)
+  {
+    if (hNext == null) {
+      return;
+    }
+    ByteBuffer mutationBuffer = buf.duplicate();
+    mutationBuffer.position(position);
+
+    ApproximateHistogram h0 = ApproximateHistogram.fromBytesDense(mutationBuffer);
+    foldFast(h0, hNext);
+
+    mutationBuffer.position(position);
+    h0.toBytesDense(mutationBuffer);
+  }
+
+  public void foldFast(ApproximateHistogram left, ApproximateHistogram right)
+  {
+    //TODO: do these have to set in every call
+    left.setLowerLimit(lowerLimit);

Review comment:
       I think it's necessary, the `fromBytesDense` call ends up using this constructor which uses a default lower/upper limit:
   
   ```
    public ApproximateHistogram(int binCount, float[] positions, long[] bins, float min, float max)
     {
       this(
           positions.length,        //size
           positions,               //positions
           bins,                    //bins
           binCount,                //binCount
           min,                     //min
           max,                     //max
           sumBins(bins, binCount), //count
           Float.NEGATIVE_INFINITY, //lowerLimit
           Float.POSITIVE_INFINITY  //upperLimit
       );
     }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org