You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/04/15 07:51:32 UTC
[GitHub] [druid] somu-imply opened a new pull request, #12439: Vectorize latest
somu-imply opened a new pull request, #12439:
URL: https://github.com/apache/druid/pull/12439
The `latest` aggregator is not vectorized. This causes queries to take up a lot of time while running this aggregator. This is the first part of a work in progress where the longLast, floatLast and doubleLast aggregators are vectorized. The three have been added to the benchmark tests as well where there has been a 50% reduction in time.
```
Benchmark (query) (rowsPerSegment) (vectorize) Mode Cnt Score Error Units
SqlExpressionBenchmark.querySql 38 5000000 false avgt 5 176.557 ± 9.317 ms/op
SqlExpressionBenchmark.querySql 38 5000000 force avgt 5 95.854 ± 3.002 ms/op
SqlExpressionBenchmark.querySql 39 5000000 false avgt 5 139.142 ± 7.784 ms/op
SqlExpressionBenchmark.querySql 39 5000000 force avgt 5 65.736 ± 5.199 ms/op
SqlExpressionBenchmark.querySql 40 5000000 false avgt 5 129.723 ± 43.386 ms/op
SqlExpressionBenchmark.querySql 40 5000000 force avgt 5 51.946 ± 4.960 ms/op
```
where
```
38-->LongLast
39-->DoubleLast
40-->FloatLast
```
This is a **work under progress as of now**.
This PR has:
- [ ] been self-reviewed.
- [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
- [ ] added documentation for new or modified features or behaviors.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
- [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
- [ ] added integration tests.
- [ ] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] suneet-s commented on a diff in pull request #12439: Vectorize latest aggregator
Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12439:
URL: https://github.com/apache/druid/pull/12439#discussion_r857254656
##########
processing/src/main/java/org/apache/druid/query/aggregation/last/LongLastVectorAggregator.java:
##########
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.last;
+
+import org.apache.druid.collections.SerializablePair;
+import org.apache.druid.segment.vector.VectorValueSelector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+/**
+ * Vectorized version of on heap 'last' aggregator for column selectors with type LONG..
+ */
+public class LongLastVectorAggregator extends NumericLastVectorAggregator
+{
+ long lastValue;
+
+ public LongLastVectorAggregator(VectorValueSelector timeSelector, VectorValueSelector valueSelector)
+ {
+ super(timeSelector, valueSelector);
+ lastValue = 0;
+ }
+
+ @Override
+ void initValue(ByteBuffer buf, int position)
+ {
+ buf.putLong(position, Long.MIN_VALUE);
Review Comment:
```suggestion
buf.putLong(position, 0);
```
##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java:
##########
@@ -836,7 +868,7 @@ public void testAnyAggregatorsOffHeapNumericNulls() throws Exception
public void testPrimitiveLatestInSubquery() throws Exception
{
// Cannot vectorize LATEST aggregator.
Review Comment:
```suggestion
```
##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java:
##########
@@ -882,6 +914,43 @@ public void testPrimitiveLatestInSubquery() throws Exception
);
}
+ @Test
+ public void testPrimitiveLatestInSubqueryGroupBy() throws Exception
+ {
+ // Cannot vectorize LATEST aggregator.
+ // skipVectorize();
Review Comment:
```suggestion
```
##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java:
##########
@@ -836,7 +868,7 @@ public void testAnyAggregatorsOffHeapNumericNulls() throws Exception
public void testPrimitiveLatestInSubquery() throws Exception
{
// Cannot vectorize LATEST aggregator.
- skipVectorize();
+ //skipVectorize();
Review Comment:
Please remove the function and the comment above from the tests that have now been vectorized.
```suggestion
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] TSFenwick commented on a diff in pull request #12439: Vectorize numeric latest aggregators
Posted by GitBox <gi...@apache.org>.
TSFenwick commented on code in PR #12439:
URL: https://github.com/apache/druid/pull/12439#discussion_r861380038
##########
processing/src/main/java/org/apache/druid/query/aggregation/last/NumericLastVectorAggregator.java:
##########
@@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.last;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.segment.vector.VectorValueSelector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+/**
+ * Base type for vectorized version of on heap 'last' aggregator for primitive numeric column selectors..
+ */
+public abstract class NumericLastVectorAggregator implements VectorAggregator
+{
+ static final int NULL_OFFSET = Long.BYTES;
+ static final int VALUE_OFFSET = NULL_OFFSET + Byte.BYTES;
+ final VectorValueSelector valueSelector;
+ private final boolean useDefault = NullHandling.replaceWithDefault();
+ private final VectorValueSelector timeSelector;
+ private long lastTime;
+
+ public NumericLastVectorAggregator(VectorValueSelector timeSelector, VectorValueSelector valueSelector)
+ {
+ this.timeSelector = timeSelector;
+ this.valueSelector = valueSelector;
+ lastTime = Long.MIN_VALUE;
+ }
+
+ @Override
+ public void init(ByteBuffer buf, int position)
+ {
+ buf.putLong(position, Long.MIN_VALUE);
+ buf.put(position + NULL_OFFSET, useDefault ? NullHandling.IS_NOT_NULL_BYTE : NullHandling.IS_NULL_BYTE);
+ initValue(buf, position + VALUE_OFFSET);
+ }
+
+ @Override
+ public void aggregate(ByteBuffer buf, int position, int startRow, int endRow)
+ {
+ final long[] timeVector = timeSelector.getLongVector();
+ final boolean[] nullValueVector = valueSelector.getNullVector();
+ boolean nullAbsent = false;
+ lastTime = buf.getLong(position);
+ //check if nullVector is found or not
Review Comment:
nit spacing
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] suneet-s commented on a diff in pull request #12439: Vectorize latest aggregator
Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12439:
URL: https://github.com/apache/druid/pull/12439#discussion_r857257165
##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java:
##########
@@ -836,7 +868,7 @@ public void testAnyAggregatorsOffHeapNumericNulls() throws Exception
public void testPrimitiveLatestInSubquery() throws Exception
{
// Cannot vectorize LATEST aggregator.
- skipVectorize();
+ //skipVectorize();
Review Comment:
Please remove the method call and the comment above from the tests that have now been vectorized.
```suggestion
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] suneet-s merged pull request #12439: Vectorize numeric latest aggregators
Posted by GitBox <gi...@apache.org>.
suneet-s merged PR #12439:
URL: https://github.com/apache/druid/pull/12439
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] suneet-s commented on a diff in pull request #12439: Vectorize latest aggregator
Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12439:
URL: https://github.com/apache/druid/pull/12439#discussion_r852445740
##########
processing/src/main/java/org/apache/druid/query/aggregation/last/NumericLastVectorAggregator.java:
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.last;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.segment.vector.VectorValueSelector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public abstract class NumericLastVectorAggregator implements VectorAggregator
+{
+ final VectorValueSelector valueSelector;
+ private final boolean useDefault = NullHandling.replaceWithDefault();
+ private final VectorValueSelector timeSelector;
+ long lastTime;
+ boolean rhsNull;
+
+ public NumericLastVectorAggregator(VectorValueSelector timeSelector, VectorValueSelector valueSelector)
+ {
+ this.timeSelector = timeSelector;
+ this.valueSelector = valueSelector;
+ lastTime = Long.MIN_VALUE;
+ rhsNull = !useDefault;
+ }
+
+ @Override
+ public void aggregate(ByteBuffer buf, int position, int startRow, int endRow)
+ {
+ final long[] vector = timeSelector.getLongVector();
+ final boolean[] nulls = valueSelector.getNullVector();
+ boolean foundNull = false;
+
+ //check if nulls is null or not
+ if (nulls == null) {
+ foundNull = true;
Review Comment:
this variable name seems confusing. if `nulls` == null, that means none of the rows are null correct?
##########
processing/src/main/java/org/apache/druid/query/aggregation/last/NumericLastVectorAggregator.java:
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.last;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.segment.vector.VectorValueSelector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public abstract class NumericLastVectorAggregator implements VectorAggregator
+{
+ final VectorValueSelector valueSelector;
+ private final boolean useDefault = NullHandling.replaceWithDefault();
+ private final VectorValueSelector timeSelector;
+ long lastTime;
+ boolean rhsNull;
+
+ public NumericLastVectorAggregator(VectorValueSelector timeSelector, VectorValueSelector valueSelector)
+ {
+ this.timeSelector = timeSelector;
+ this.valueSelector = valueSelector;
+ lastTime = Long.MIN_VALUE;
+ rhsNull = !useDefault;
+ }
+
+ @Override
+ public void aggregate(ByteBuffer buf, int position, int startRow, int endRow)
+ {
+ final long[] vector = timeSelector.getLongVector();
+ final boolean[] nulls = valueSelector.getNullVector();
+ boolean foundNull = false;
+
+ //check if nulls is null or not
+ if (nulls == null) {
+ foundNull = true;
+ }
+
+ //the time vector is already sorted so the last element would be the latest
+ //traverse the value vector from the back (for latest)
+
+ int index = 0;
Review Comment:
```suggestion
int index = endRow - i;
```
##########
processing/src/main/java/org/apache/druid/query/aggregation/last/NumericLastVectorAggregator.java:
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.last;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.segment.vector.VectorValueSelector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public abstract class NumericLastVectorAggregator implements VectorAggregator
+{
+ final VectorValueSelector valueSelector;
+ private final boolean useDefault = NullHandling.replaceWithDefault();
+ private final VectorValueSelector timeSelector;
+ long lastTime;
+ boolean rhsNull;
+
+ public NumericLastVectorAggregator(VectorValueSelector timeSelector, VectorValueSelector valueSelector)
+ {
+ this.timeSelector = timeSelector;
+ this.valueSelector = valueSelector;
+ lastTime = Long.MIN_VALUE;
+ rhsNull = !useDefault;
+ }
+
+ @Override
+ public void aggregate(ByteBuffer buf, int position, int startRow, int endRow)
+ {
+ final long[] vector = timeSelector.getLongVector();
+ final boolean[] nulls = valueSelector.getNullVector();
+ boolean foundNull = false;
+
+ //check if nulls is null or not
+ if (nulls == null) {
+ foundNull = true;
+ }
+
+ //the time vector is already sorted so the last element would be the latest
+ //traverse the value vector from the back (for latest)
+
+ int index = 0;
+ if (!useDefault && foundNull == false) {
+ for (int i = endRow - 1; i >= startRow; i--) {
+ if (nulls[i] == false) {
+ index = i;
+ break;
+ }
+ }
+ } else {
+ index = endRow - 1;
+ }
+
+ //find the first non-null value
+ final long latestTime = vector[index];
+
+ if (latestTime > lastTime) {
+ lastTime = latestTime;
+ if (useDefault || (index >= startRow && index < endRow)) {
+ putValue(buf, position, index);
+ rhsNull = false;
+ } else {
+ rhsNull = true;
+ }
+ }
+ }
+
+ @Override
+ public void aggregate(
+ ByteBuffer buf,
+ int numRows,
+ int[] positions,
+ @Nullable int[] rows,
+ int positionOffset
+ )
+ {
+
Review Comment:
I think this needs to be implemented, but maybe it is still WIP?
##########
processing/src/main/java/org/apache/druid/query/aggregation/last/NumericLastVectorAggregator.java:
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.last;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.segment.vector.VectorValueSelector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public abstract class NumericLastVectorAggregator implements VectorAggregator
+{
+ final VectorValueSelector valueSelector;
+ private final boolean useDefault = NullHandling.replaceWithDefault();
+ private final VectorValueSelector timeSelector;
+ long lastTime;
+ boolean rhsNull;
+
+ public NumericLastVectorAggregator(VectorValueSelector timeSelector, VectorValueSelector valueSelector)
+ {
+ this.timeSelector = timeSelector;
+ this.valueSelector = valueSelector;
+ lastTime = Long.MIN_VALUE;
+ rhsNull = !useDefault;
+ }
+
+ @Override
+ public void aggregate(ByteBuffer buf, int position, int startRow, int endRow)
+ {
+ final long[] vector = timeSelector.getLongVector();
+ final boolean[] nulls = valueSelector.getNullVector();
+ boolean foundNull = false;
+
+ //check if nulls is null or not
+ if (nulls == null) {
+ foundNull = true;
+ }
+
+ //the time vector is already sorted so the last element would be the latest
+ //traverse the value vector from the back (for latest)
+
+ int index = 0;
+ if (!useDefault && foundNull == false) {
+ for (int i = endRow - 1; i >= startRow; i--) {
+ if (nulls[i] == false) {
Review Comment:
nit: I think the style guide prefers not comparing booleans to a boolean literal.
```suggestion
if (!nulls[i]) {
```
##########
processing/src/main/java/org/apache/druid/query/aggregation/last/NumericLastVectorAggregator.java:
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.last;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.segment.vector.VectorValueSelector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public abstract class NumericLastVectorAggregator implements VectorAggregator
+{
+ final VectorValueSelector valueSelector;
+ private final boolean useDefault = NullHandling.replaceWithDefault();
+ private final VectorValueSelector timeSelector;
+ long lastTime;
+ boolean rhsNull;
+
+ public NumericLastVectorAggregator(VectorValueSelector timeSelector, VectorValueSelector valueSelector)
+ {
+ this.timeSelector = timeSelector;
+ this.valueSelector = valueSelector;
+ lastTime = Long.MIN_VALUE;
+ rhsNull = !useDefault;
+ }
+
+ @Override
+ public void aggregate(ByteBuffer buf, int position, int startRow, int endRow)
+ {
+ final long[] vector = timeSelector.getLongVector();
+ final boolean[] nulls = valueSelector.getNullVector();
+ boolean foundNull = false;
+
+ //check if nulls is null or not
+ if (nulls == null) {
+ foundNull = true;
+ }
+
+ //the time vector is already sorted so the last element would be the latest
+ //traverse the value vector from the back (for latest)
+
+ int index = 0;
+ if (!useDefault && foundNull == false) {
+ for (int i = endRow - 1; i >= startRow; i--) {
+ if (nulls[i] == false) {
+ index = i;
+ break;
+ }
+ }
+ } else {
+ index = endRow - 1;
+ }
+
+ //find the first non-null value
+ final long latestTime = vector[index];
+
+ if (latestTime > lastTime) {
+ lastTime = latestTime;
+ if (useDefault || (index >= startRow && index < endRow)) {
+ putValue(buf, position, index);
+ rhsNull = false;
+ } else {
+ rhsNull = true;
+ }
+ }
+ }
+
+ @Override
+ public void aggregate(
+ ByteBuffer buf,
+ int numRows,
+ int[] positions,
+ @Nullable int[] rows,
+ int positionOffset
+ )
+ {
+
+ }
+
+ abstract void putValue(ByteBuffer buf, int position, int index);
+
+ @Override
+ public void close()
+ {
+
Review Comment:
nit:
```suggestion
// No resources to cleanup.
```
##########
processing/src/main/java/org/apache/druid/query/aggregation/last/NumericLastVectorAggregator.java:
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.last;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.segment.vector.VectorValueSelector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public abstract class NumericLastVectorAggregator implements VectorAggregator
+{
+ final VectorValueSelector valueSelector;
+ private final boolean useDefault = NullHandling.replaceWithDefault();
+ private final VectorValueSelector timeSelector;
+ long lastTime;
+ boolean rhsNull;
+
+ public NumericLastVectorAggregator(VectorValueSelector timeSelector, VectorValueSelector valueSelector)
+ {
+ this.timeSelector = timeSelector;
+ this.valueSelector = valueSelector;
+ lastTime = Long.MIN_VALUE;
+ rhsNull = !useDefault;
+ }
+
+ @Override
+ public void aggregate(ByteBuffer buf, int position, int startRow, int endRow)
+ {
+ final long[] vector = timeSelector.getLongVector();
+ final boolean[] nulls = valueSelector.getNullVector();
+ boolean foundNull = false;
+
+ //check if nulls is null or not
+ if (nulls == null) {
+ foundNull = true;
+ }
+
+ //the time vector is already sorted so the last element would be the latest
+ //traverse the value vector from the back (for latest)
+
+ int index = 0;
+ if (!useDefault && foundNull == false) {
+ for (int i = endRow - 1; i >= startRow; i--) {
+ if (nulls[i] == false) {
+ index = i;
+ break;
+ }
+ }
+ } else {
+ index = endRow - 1;
+ }
Review Comment:
part of nit on line 60
```suggestion
}
```
##########
processing/src/main/java/org/apache/druid/query/aggregation/last/NumericLastVectorAggregator.java:
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.last;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.segment.vector.VectorValueSelector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public abstract class NumericLastVectorAggregator implements VectorAggregator
+{
+ final VectorValueSelector valueSelector;
+ private final boolean useDefault = NullHandling.replaceWithDefault();
+ private final VectorValueSelector timeSelector;
+ long lastTime;
+ boolean rhsNull;
+
+ public NumericLastVectorAggregator(VectorValueSelector timeSelector, VectorValueSelector valueSelector)
+ {
+ this.timeSelector = timeSelector;
+ this.valueSelector = valueSelector;
+ lastTime = Long.MIN_VALUE;
+ rhsNull = !useDefault;
+ }
+
+ @Override
+ public void aggregate(ByteBuffer buf, int position, int startRow, int endRow)
+ {
+ final long[] vector = timeSelector.getLongVector();
+ final boolean[] nulls = valueSelector.getNullVector();
+ boolean foundNull = false;
+
+ //check if nulls is null or not
+ if (nulls == null) {
+ foundNull = true;
+ }
+
+ //the time vector is already sorted so the last element would be the latest
+ //traverse the value vector from the back (for latest)
+
+ int index = 0;
+ if (!useDefault && foundNull == false) {
+ for (int i = endRow - 1; i >= startRow; i--) {
+ if (nulls[i] == false) {
+ index = i;
+ break;
+ }
+ }
+ } else {
+ index = endRow - 1;
+ }
+
+ //find the first non-null value
+ final long latestTime = vector[index];
Review Comment:
I think there's a missing bounds check for `index` here since i can be 0 which might be less than start row
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] somu-imply commented on a diff in pull request #12439: Vectorize latest aggregator
Posted by GitBox <gi...@apache.org>.
somu-imply commented on code in PR #12439:
URL: https://github.com/apache/druid/pull/12439#discussion_r852454286
##########
processing/src/main/java/org/apache/druid/query/aggregation/last/NumericLastVectorAggregator.java:
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.last;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.segment.vector.VectorValueSelector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public abstract class NumericLastVectorAggregator implements VectorAggregator
+{
+ final VectorValueSelector valueSelector;
+ private final boolean useDefault = NullHandling.replaceWithDefault();
+ private final VectorValueSelector timeSelector;
+ long lastTime;
+ boolean rhsNull;
+
+ public NumericLastVectorAggregator(VectorValueSelector timeSelector, VectorValueSelector valueSelector)
+ {
+ this.timeSelector = timeSelector;
+ this.valueSelector = valueSelector;
+ lastTime = Long.MIN_VALUE;
+ rhsNull = !useDefault;
+ }
+
+ @Override
+ public void aggregate(ByteBuffer buf, int position, int startRow, int endRow)
+ {
+ final long[] vector = timeSelector.getLongVector();
+ final boolean[] nulls = valueSelector.getNullVector();
+ boolean foundNull = false;
+
+ //check if nulls is null or not
+ if (nulls == null) {
+ foundNull = true;
+ }
+
+ //the time vector is already sorted so the last element would be the latest
+ //traverse the value vector from the back (for latest)
+
+ int index = 0;
+ if (!useDefault && foundNull == false) {
+ for (int i = endRow - 1; i >= startRow; i--) {
+ if (nulls[i] == false) {
+ index = i;
+ break;
+ }
+ }
+ } else {
+ index = endRow - 1;
+ }
+
+ //find the first non-null value
+ final long latestTime = vector[index];
+
+ if (latestTime > lastTime) {
+ lastTime = latestTime;
+ if (useDefault || (index >= startRow && index < endRow)) {
+ putValue(buf, position, index);
+ rhsNull = false;
+ } else {
+ rhsNull = true;
+ }
+ }
+ }
+
+ @Override
+ public void aggregate(
+ ByteBuffer buf,
+ int numRows,
+ int[] positions,
+ @Nullable int[] rows,
+ int positionOffset
+ )
+ {
+
Review Comment:
Yes it is
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] somu-imply commented on pull request #12439: Vectorize latest aggregator
Posted by GitBox <gi...@apache.org>.
somu-imply commented on PR #12439:
URL: https://github.com/apache/druid/pull/12439#issuecomment-1108672033
Will address the comments and add the unit tests today
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] suneet-s commented on pull request #12439: Vectorize numeric latest aggregators
Posted by GitBox <gi...@apache.org>.
suneet-s commented on PR #12439:
URL: https://github.com/apache/druid/pull/12439#issuecomment-1110123341
Overruling Travis because test failures are due to lack of code coverage. The code coverage should be substantially higher than reported because this does not include the CalciteQueryTests and the null compatibility tests that are done in separate jobs
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] suneet-s commented on a diff in pull request #12439: Vectorize latest aggregator
Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12439:
URL: https://github.com/apache/druid/pull/12439#discussion_r851497339
##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java:
##########
@@ -648,6 +648,42 @@ public void testEarliestAggregators() throws Exception
);
}
+ @Test
+ public void testLatestVectorAggregators() throws Exception
+ {
+ // Cannot vectorize LATEST aggregator.
+ //skipVectorize();
+
+ testQuery(
+ "SELECT "
+ + "LATEST(cnt), LATEST(cnt + 1), LATEST(m1), LATEST(m1+1) "
+ + "FROM druid.numfoo",
+ ImmutableList.of(
+ Druids.newTimeseriesQueryBuilder()
+ .dataSource(CalciteTests.DATASOURCE3)
+ .intervals(querySegmentSpec(Filtration.eternity()))
+ .granularity(Granularities.ALL)
+ .virtualColumns(
+ expressionVirtualColumn("v0", "(\"cnt\" + 1)", ColumnType.LONG),
+ expressionVirtualColumn("v1", "(\"m1\" + 1)", ColumnType.FLOAT)
+ )
+ .aggregators(
+ aggregators(
+ new LongLastAggregatorFactory("a0", "cnt", null),
+ new LongLastAggregatorFactory("a1", "v0", null),
+ new FloatLastAggregatorFactory("a2", "m1", null),
+ new FloatLastAggregatorFactory("a3", "v1", null)
+ )
+ )
+ .context(QUERY_CONTEXT_DEFAULT)
+ .build()
+ ),
+ ImmutableList.of(
+ new Object[]{1L, 2L, 6.0f, 7.0f}
+ )
+ );
+ }
+
Review Comment:
Can you search for uses of these AggregatorFactories in this class and remove calls to `skipVectorize()`.
This way you should get additional coverage pretty easily.
I found 8 references to `// Cannot vectorize LATEST aggregator.` when I searched.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] suneet-s commented on a diff in pull request #12439: Vectorize latest aggregator
Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12439:
URL: https://github.com/apache/druid/pull/12439#discussion_r857874447
##########
processing/src/main/java/org/apache/druid/query/aggregation/last/FloatLastVectorAggregator.java:
##########
@@ -47,9 +47,9 @@ void putValue(ByteBuffer buf, int position, int index)
}
@Override
- void initValue(ByteBuffer buf, int position)
+ public void initValue(ByteBuffer buf, int position)
{
- buf.putFloat(position, 0);
+ buf.putFloat(position, Float.NEGATIVE_INFINITY);
Review Comment:
hmm shouldn't this be 0? `FloatLastAggregator` initializes to 0
```suggestion
buf.putFloat(position, 0);
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] suneet-s commented on a diff in pull request #12439: Vectorize latest aggregator
Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12439:
URL: https://github.com/apache/druid/pull/12439#discussion_r852374903
##########
processing/src/main/java/org/apache/druid/query/aggregation/last/NumericLastVectorAggregator.java:
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.last;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.segment.vector.VectorValueSelector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public abstract class NumericLastVectorAggregator implements VectorAggregator
Review Comment:
javadocs please.
Similar comment for all new classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] suneet-s commented on a diff in pull request #12439: Vectorize numeric latest aggregators
Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12439:
URL: https://github.com/apache/druid/pull/12439#discussion_r858080166
##########
processing/src/test/java/org/apache/druid/query/aggregation/last/FloatLastVectorAggregatorTest.java:
##########
@@ -65,14 +65,15 @@ public void setup()
@Test
public void initValueShouldBeNegInf()
{
- target.initValue(buf, 0);
+ target.init(buf, 0);
Review Comment:
This test looks incorrect. When we call init, the timestamp should be inserted at `position`
```suggestion
target.initValue(buf, 0);
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] somu-imply commented on pull request #12439: Vectorize latest aggregator
Posted by GitBox <gi...@apache.org>.
somu-imply commented on PR #12439:
URL: https://github.com/apache/druid/pull/12439#issuecomment-1108840648
Could beef up coverage to
<html>
<body>
-- | -- | -- | -- | --
DoubleLastVectorAggregator | 100% (1/1) | 100% (4/4) | 100% (10/10) | 50% (1/2)
FloatLastVectorAggregator | 100% (1/1) | 100% (4/4) | 100% (10/10) | 50% (1/2)
LongLastVectorAggregator | 100% (1/1) | 100% (4/4) | 100% (10/10) | 50% (1/2)
NumericLastVectorAggregator | 100% (1/1) | 62% (5/8) | 71% (37/52) | 34% (13/38)
</body>
</html>