You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/04/29 07:24:15 UTC

[GitHub] [druid] somu-imply opened a new pull request, #12493: Vectorized version of string last aggregator

somu-imply opened a new pull request, #12493:
URL: https://github.com/apache/druid/pull/12493

   
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a diff in pull request #12493: Vectorized version of string last aggregator

Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12493:
URL: https://github.com/apache/druid/pull/12493#discussion_r862940161


##########
processing/src/test/java/org/apache/druid/query/aggregation/last/StringLastVectorAggregatorTest.java:
##########
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.last;
+
+import org.apache.druid.java.util.common.DateTimes;
+import org.apache.druid.java.util.common.Pair;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.segment.vector.BaseLongVectorValueSelector;
+import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorObjectSelector;
+import org.apache.druid.testing.InitializedNullHandlingTest;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.mockito.Answers;
+import org.mockito.Mock;
+import org.mockito.Mockito;
+import org.mockito.junit.MockitoJUnitRunner;
+
+import java.nio.ByteBuffer;
+import java.util.concurrent.ThreadLocalRandom;
+
+
+@RunWith(MockitoJUnitRunner.class)
+public class StringLastVectorAggregatorTest extends InitializedNullHandlingTest
+{
+  private static final double EPSILON = 1e-5;
+  private static final String[] VALUES = new String[]{"a", "b", null, "c"};

Review Comment:
   Can you add unit tests for when the `VALUES` are an array of SerializablePairLongString
   
   Particularly interesting tests are when the time values for the time selectors are the same, but the `lhs` in the pairs are different. eg What should `target.aggregate(...) && target.get(...)` return if this was the input
   
   |times|values|
   |12345000|<12345100, Last>|
   |12345000|<12345001, NotLast>|
   
   In this case, I would expect the result to be `<12345100, Last>`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #12493: Vectorized version of string last aggregator

Posted by GitBox <gi...@apache.org>.
somu-imply commented on code in PR #12493:
URL: https://github.com/apache/druid/pull/12493#discussion_r867523415


##########
processing/src/main/java/org/apache/druid/query/aggregation/last/StringLastVectorAggregator.java:
##########
@@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.last;
+
+import org.apache.druid.java.util.common.DateTimes;
+import org.apache.druid.query.aggregation.SerializablePairLongString;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.query.aggregation.first.StringFirstLastUtils;
+import org.apache.druid.segment.DimensionHandlerUtils;
+import org.apache.druid.segment.vector.BaseLongVectorValueSelector;
+import org.apache.druid.segment.vector.VectorObjectSelector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public class StringLastVectorAggregator implements VectorAggregator
+{
+  private static final SerializablePairLongString INIT = new SerializablePairLongString(
+      DateTimes.MIN.getMillis(),
+      null
+  );
+  private final BaseLongVectorValueSelector timeSelector;
+  private final VectorObjectSelector valueSelector;
+  private final int maxStringBytes;
+  protected long lastTime;
+  protected String lastValue;
+
+  public StringLastVectorAggregator(
+      final BaseLongVectorValueSelector timeSelector,
+      final VectorObjectSelector valueSelector,
+      final int maxStringBytes
+  )
+  {
+    this.timeSelector = timeSelector;
+    this.valueSelector = valueSelector;
+    this.maxStringBytes = maxStringBytes;
+  }
+
+  @Override
+  public void init(ByteBuffer buf, int position)
+  {
+    StringFirstLastUtils.writePair(buf, position, INIT, maxStringBytes);
+  }
+
+  @Override
+  public void aggregate(ByteBuffer buf, int position, int startRow, int endRow)
+  {
+    if (timeSelector == null) {
+      return;
+    }
+    long[] times = timeSelector.getLongVector();
+    Object[] objectsWhichMightBeStrings = valueSelector.getObjectVector();
+
+    lastTime = buf.getLong(position);
+    int index = endRow - 1;
+    for (int i = endRow - 1; i >= startRow; i--) {
+      if (objectsWhichMightBeStrings[i] != null) {
+        index = i;
+        break;
+      }
+    }
+    final boolean foldNeeded = StringFirstLastUtils.objectNeedsFoldCheck(objectsWhichMightBeStrings[index]);
+    if (foldNeeded) {
+      // Less efficient code path when folding is a possibility (we must read the value selector first just in case
+      // it's a foldable object).

Review Comment:
   Updated, unit test added



##########
processing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstLastUtils.java:
##########
@@ -59,6 +61,46 @@ public static boolean selectorNeedsFoldCheck(
            || SerializablePairLongString.class.isAssignableFrom(clazz);
   }
 
+  /**
+   * Returns whether an object *might* contain SerializablePairLongString objects.
+   */
+  public static boolean objectNeedsFoldCheck(Object obj)
+  {
+    if (obj == null) {
+      return false;
+    }
+    final Class<?> clazz = obj.getClass();
+    return clazz.isAssignableFrom(SerializablePairLongString.class)
+           || SerializablePairLongString.class.isAssignableFrom(clazz);
+  }
+
+  /**
+   * Return the object at a particular index from the vector selectors
+   */

Review Comment:
   Updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a diff in pull request #12493: Vectorized version of string last aggregator

Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12493:
URL: https://github.com/apache/druid/pull/12493#discussion_r862952927


##########
processing/src/main/java/org/apache/druid/query/aggregation/last/StringLastVectorAggregator.java:
##########
@@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.last;
+
+import org.apache.druid.java.util.common.DateTimes;
+import org.apache.druid.query.aggregation.SerializablePairLongString;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.query.aggregation.first.StringFirstLastUtils;
+import org.apache.druid.segment.DimensionHandlerUtils;
+import org.apache.druid.segment.vector.BaseLongVectorValueSelector;
+import org.apache.druid.segment.vector.VectorObjectSelector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public class StringLastVectorAggregator implements VectorAggregator
+{
+  private static final SerializablePairLongString INIT = new SerializablePairLongString(
+      DateTimes.MIN.getMillis(),
+      null
+  );
+  private final BaseLongVectorValueSelector timeSelector;
+  private final VectorObjectSelector valueSelector;
+  private final int maxStringBytes;
+  protected long lastTime;
+  protected String lastValue;
+
+  public StringLastVectorAggregator(
+      final BaseLongVectorValueSelector timeSelector,
+      final VectorObjectSelector valueSelector,
+      final int maxStringBytes
+  )
+  {
+    this.timeSelector = timeSelector;
+    this.valueSelector = valueSelector;
+    this.maxStringBytes = maxStringBytes;
+  }
+
+  @Override
+  public void init(ByteBuffer buf, int position)
+  {
+    StringFirstLastUtils.writePair(buf, position, INIT, maxStringBytes);
+  }
+
+  @Override
+  public void aggregate(ByteBuffer buf, int position, int startRow, int endRow)
+  {
+    if (timeSelector == null) {
+      return;
+    }
+    long[] times = timeSelector.getLongVector();
+    Object[] objectsWhichMightBeStrings = valueSelector.getObjectVector();
+
+    lastTime = buf.getLong(position);
+    int index = endRow - 1;
+    for (int i = endRow - 1; i >= startRow; i--) {
+      if (objectsWhichMightBeStrings[i] != null) {
+        index = i;
+        break;
+      }
+    }
+    final boolean foldNeeded = StringFirstLastUtils.objectNeedsFoldCheck(objectsWhichMightBeStrings[index]);
+    if (foldNeeded) {

Review Comment:
   I think you can get rid of this utility function and just call `objectsWhichMightBeStrings[index] instanceof SerializablePairLongString` I think that would be easier to read / maintain



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a diff in pull request #12493: Vectorized version of string last aggregator

Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12493:
URL: https://github.com/apache/druid/pull/12493#discussion_r862945708


##########
processing/src/main/java/org/apache/druid/query/aggregation/last/StringLastVectorAggregator.java:
##########
@@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.last;
+
+import org.apache.druid.java.util.common.DateTimes;
+import org.apache.druid.query.aggregation.SerializablePairLongString;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.query.aggregation.first.StringFirstLastUtils;
+import org.apache.druid.segment.DimensionHandlerUtils;
+import org.apache.druid.segment.vector.BaseLongVectorValueSelector;
+import org.apache.druid.segment.vector.VectorObjectSelector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public class StringLastVectorAggregator implements VectorAggregator
+{
+  private static final SerializablePairLongString INIT = new SerializablePairLongString(
+      DateTimes.MIN.getMillis(),
+      null
+  );
+  private final BaseLongVectorValueSelector timeSelector;
+  private final VectorObjectSelector valueSelector;
+  private final int maxStringBytes;
+  protected long lastTime;
+  protected String lastValue;
+
+  public StringLastVectorAggregator(
+      final BaseLongVectorValueSelector timeSelector,
+      final VectorObjectSelector valueSelector,
+      final int maxStringBytes
+  )
+  {
+    this.timeSelector = timeSelector;
+    this.valueSelector = valueSelector;
+    this.maxStringBytes = maxStringBytes;
+  }
+
+  @Override
+  public void init(ByteBuffer buf, int position)
+  {
+    StringFirstLastUtils.writePair(buf, position, INIT, maxStringBytes);
+  }
+
+  @Override
+  public void aggregate(ByteBuffer buf, int position, int startRow, int endRow)
+  {
+    if (timeSelector == null) {
+      return;
+    }
+    long[] times = timeSelector.getLongVector();
+    Object[] objectsWhichMightBeStrings = valueSelector.getObjectVector();
+
+    lastTime = buf.getLong(position);
+    int index = endRow - 1;
+    for (int i = endRow - 1; i >= startRow; i--) {
+      if (objectsWhichMightBeStrings[i] != null) {

Review Comment:
   I think this should be looking for the first non null value which has time >= lastTime



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #12493: Vectorized version of string last aggregator

Posted by GitBox <gi...@apache.org>.
somu-imply commented on code in PR #12493:
URL: https://github.com/apache/druid/pull/12493#discussion_r867523262


##########
processing/src/test/java/org/apache/druid/query/aggregation/last/StringLastVectorAggregatorTest.java:
##########
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.last;
+
+import org.apache.druid.java.util.common.DateTimes;
+import org.apache.druid.java.util.common.Pair;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.segment.vector.BaseLongVectorValueSelector;
+import org.apache.druid.segment.vector.VectorColumnSelectorFactory;
+import org.apache.druid.segment.vector.VectorObjectSelector;
+import org.apache.druid.testing.InitializedNullHandlingTest;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.mockito.Answers;
+import org.mockito.Mock;
+import org.mockito.Mockito;
+import org.mockito.junit.MockitoJUnitRunner;
+
+import java.nio.ByteBuffer;
+import java.util.concurrent.ThreadLocalRandom;
+
+
+@RunWith(MockitoJUnitRunner.class)
+public class StringLastVectorAggregatorTest extends InitializedNullHandlingTest
+{
+  private static final double EPSILON = 1e-5;
+  private static final String[] VALUES = new String[]{"a", "b", null, "c"};

Review Comment:
   Updated added unit tests to reflect this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #12493: Vectorized version of string last aggregator

Posted by GitBox <gi...@apache.org>.
somu-imply commented on code in PR #12493:
URL: https://github.com/apache/druid/pull/12493#discussion_r868283733


##########
processing/src/main/java/org/apache/druid/query/aggregation/last/StringLastVectorAggregator.java:
##########
@@ -69,47 +68,48 @@ public void aggregate(ByteBuffer buf, int position, int startRow, int endRow)
     Object[] objectsWhichMightBeStrings = valueSelector.getObjectVector();
 
     lastTime = buf.getLong(position);
-    int index = endRow - 1;
+    int index;
     for (int i = endRow - 1; i >= startRow; i--) {
-      if (objectsWhichMightBeStrings[i] != null) {
-        index = i;
+      if (times[i] < lastTime && objectsWhichMightBeStrings[i] != null) {
         break;
       }
-    }
-    final boolean foldNeeded = StringFirstLastUtils.objectNeedsFoldCheck(objectsWhichMightBeStrings[index]);
-    if (foldNeeded) {
-      // Less efficient code path when folding is a possibility (we must read the value selector first just in case
-      // it's a foldable object).
-      final SerializablePairLongString inPair = StringFirstLastUtils.readPairFromVectorSelectorsAtIndex(
-          timeSelector,
-          valueSelector,
-          index
-      );
-      if (inPair != null) {
-        final long lastTime = buf.getLong(position);
-        if (inPair.lhs >= lastTime) {
+      index = i;
+      final boolean foldNeeded = StringFirstLastUtils.objectNeedsFoldCheck(objectsWhichMightBeStrings[index]);

Review Comment:
   After the change, we are not hitting this function unless `objectsWhichMightBeStrings[index]` is non null. We are setting up a loop to continue if there is a null object. This guarantees that the function `objectNeedsFoldCheck` always accepts a non null object. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a diff in pull request #12493: Vectorized version of string last aggregator

Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12493:
URL: https://github.com/apache/druid/pull/12493#discussion_r862942027


##########
processing/src/main/java/org/apache/druid/query/aggregation/last/StringLastVectorAggregator.java:
##########
@@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.last;
+
+import org.apache.druid.java.util.common.DateTimes;
+import org.apache.druid.query.aggregation.SerializablePairLongString;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.query.aggregation.first.StringFirstLastUtils;
+import org.apache.druid.segment.DimensionHandlerUtils;
+import org.apache.druid.segment.vector.BaseLongVectorValueSelector;
+import org.apache.druid.segment.vector.VectorObjectSelector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public class StringLastVectorAggregator implements VectorAggregator
+{
+  private static final SerializablePairLongString INIT = new SerializablePairLongString(
+      DateTimes.MIN.getMillis(),
+      null
+  );
+  private final BaseLongVectorValueSelector timeSelector;
+  private final VectorObjectSelector valueSelector;
+  private final int maxStringBytes;
+  protected long lastTime;
+  protected String lastValue;
+
+  public StringLastVectorAggregator(
+      final BaseLongVectorValueSelector timeSelector,
+      final VectorObjectSelector valueSelector,
+      final int maxStringBytes
+  )
+  {
+    this.timeSelector = timeSelector;
+    this.valueSelector = valueSelector;
+    this.maxStringBytes = maxStringBytes;
+  }
+
+  @Override
+  public void init(ByteBuffer buf, int position)
+  {
+    StringFirstLastUtils.writePair(buf, position, INIT, maxStringBytes);
+  }
+
+  @Override
+  public void aggregate(ByteBuffer buf, int position, int startRow, int endRow)
+  {
+    if (timeSelector == null) {
+      return;
+    }
+    long[] times = timeSelector.getLongVector();
+    Object[] objectsWhichMightBeStrings = valueSelector.getObjectVector();
+
+    lastTime = buf.getLong(position);
+    int index = endRow - 1;
+    for (int i = endRow - 1; i >= startRow; i--) {
+      if (objectsWhichMightBeStrings[i] != null) {
+        index = i;
+        break;
+      }
+    }
+    final boolean foldNeeded = StringFirstLastUtils.objectNeedsFoldCheck(objectsWhichMightBeStrings[index]);
+    if (foldNeeded) {
+      // Less efficient code path when folding is a possibility (we must read the value selector first just in case
+      // it's a foldable object).
+      final SerializablePairLongString inPair = StringFirstLastUtils.readPairFromVectorSelectorsAtIndex(
+          timeSelector,
+          valueSelector,
+          index
+      );
+      if (inPair != null) {
+        final long lastTime = buf.getLong(position);
+        if (inPair.lhs >= lastTime) {
+          StringFirstLastUtils.writePair(
+              buf,
+              position,
+              new SerializablePairLongString(inPair.lhs, inPair.rhs),
+              maxStringBytes
+          );
+        }
+      }
+    } else {
+      final long time = times[index];
+
+      if (time >= lastTime) {
+        final String value = DimensionHandlerUtils.convertObjectToString(objectsWhichMightBeStrings[index]);
+        lastTime = time;
+        StringFirstLastUtils.writePair(
+            buf,
+            position,
+            new SerializablePairLongString(time, value),
+            maxStringBytes
+        );
+      }
+    }
+  }
+
+  @Override
+  public void aggregate(
+      ByteBuffer buf,
+      int numRows,
+      int[] positions,
+      @Nullable int[] rows,
+      int positionOffset
+  )
+  {
+    long[] timeVector = timeSelector.getLongVector();
+    Object[] objectsWhichMightBeStrings = valueSelector.getObjectVector();
+    for (int i = 0; i < numRows; i++) {
+      int position = positions[i] + positionOffset;
+      int row = rows == null ? i : rows[i];
+      long lastTime = buf.getLong(position);
+      if (timeVector[row] >= lastTime) {
+        //check if needs fold check or not
+        final boolean foldNeeded = StringFirstLastUtils.objectNeedsFoldCheck(objectsWhichMightBeStrings[row]);

Review Comment:
   I think we're guaranteed that all the objects in the array are the same type, so we don't need to do this check on every matching row, so I think this can be moved to ~ line 125 with the appropriate out of bounds checks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a diff in pull request #12493: Vectorized version of string last aggregator

Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12493:
URL: https://github.com/apache/druid/pull/12493#discussion_r862931985


##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java:
##########
@@ -999,9 +1031,7 @@ public void testPrimitiveEarliestInSubquery() throws Exception
   @Test
   public void testStringLatestInSubquery() throws Exception
   {
-    // Cannot vectorize LATEST aggregator for Strings
-    skipVectorize();
-
+    //skipVectorize();

Review Comment:
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #12493: Vectorized version of string last aggregator

Posted by GitBox <gi...@apache.org>.
somu-imply commented on code in PR #12493:
URL: https://github.com/apache/druid/pull/12493#discussion_r868285312


##########
processing/src/main/java/org/apache/druid/query/aggregation/last/StringLastVectorAggregator.java:
##########
@@ -69,47 +68,48 @@ public void aggregate(ByteBuffer buf, int position, int startRow, int endRow)
     Object[] objectsWhichMightBeStrings = valueSelector.getObjectVector();
 
     lastTime = buf.getLong(position);
-    int index = endRow - 1;
+    int index;
     for (int i = endRow - 1; i >= startRow; i--) {
-      if (objectsWhichMightBeStrings[i] != null) {
-        index = i;
+      if (times[i] < lastTime && objectsWhichMightBeStrings[i] != null) {
         break;
       }

Review Comment:
   Great comment ! I am adding this in



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on pull request #12493: Vectorized version of string last aggregator

Posted by GitBox <gi...@apache.org>.
somu-imply commented on PR #12493:
URL: https://github.com/apache/druid/pull/12493#issuecomment-1114580249

   Let me know how this looks. I'll add the benchmarking tests and updated javadocs in the next checkin


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on pull request #12493: Vectorized version of string last aggregator

Posted by GitBox <gi...@apache.org>.
suneet-s commented on PR #12493:
URL: https://github.com/apache/druid/pull/12493#issuecomment-1121698254

   Overruling flaky tests on Travis


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #12493: Vectorized version of string last aggregator

Posted by GitBox <gi...@apache.org>.
somu-imply commented on code in PR #12493:
URL: https://github.com/apache/druid/pull/12493#discussion_r867523539


##########
processing/src/main/java/org/apache/druid/query/aggregation/last/StringLastVectorAggregator.java:
##########
@@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.last;
+
+import org.apache.druid.java.util.common.DateTimes;
+import org.apache.druid.query.aggregation.SerializablePairLongString;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.query.aggregation.first.StringFirstLastUtils;
+import org.apache.druid.segment.DimensionHandlerUtils;
+import org.apache.druid.segment.vector.BaseLongVectorValueSelector;
+import org.apache.druid.segment.vector.VectorObjectSelector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public class StringLastVectorAggregator implements VectorAggregator
+{
+  private static final SerializablePairLongString INIT = new SerializablePairLongString(
+      DateTimes.MIN.getMillis(),
+      null
+  );
+  private final BaseLongVectorValueSelector timeSelector;
+  private final VectorObjectSelector valueSelector;
+  private final int maxStringBytes;
+  protected long lastTime;
+  protected String lastValue;
+
+  public StringLastVectorAggregator(
+      final BaseLongVectorValueSelector timeSelector,
+      final VectorObjectSelector valueSelector,
+      final int maxStringBytes
+  )
+  {
+    this.timeSelector = timeSelector;
+    this.valueSelector = valueSelector;
+    this.maxStringBytes = maxStringBytes;
+  }
+
+  @Override
+  public void init(ByteBuffer buf, int position)
+  {
+    StringFirstLastUtils.writePair(buf, position, INIT, maxStringBytes);
+  }
+
+  @Override
+  public void aggregate(ByteBuffer buf, int position, int startRow, int endRow)
+  {
+    if (timeSelector == null) {
+      return;
+    }
+    long[] times = timeSelector.getLongVector();
+    Object[] objectsWhichMightBeStrings = valueSelector.getObjectVector();
+
+    lastTime = buf.getLong(position);
+    int index = endRow - 1;
+    for (int i = endRow - 1; i >= startRow; i--) {
+      if (objectsWhichMightBeStrings[i] != null) {
+        index = i;
+        break;
+      }
+    }
+    final boolean foldNeeded = StringFirstLastUtils.objectNeedsFoldCheck(objectsWhichMightBeStrings[index]);
+    if (foldNeeded) {

Review Comment:
   I chose to keep it similar as to the normal aggregator for the same flow which is `Check if the selector class could possibly be a SerializablePairLongString (either a superclass or subclass).`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a diff in pull request #12493: Vectorized version of string last aggregator

Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12493:
URL: https://github.com/apache/druid/pull/12493#discussion_r862951087


##########
processing/src/main/java/org/apache/druid/query/aggregation/first/StringFirstLastUtils.java:
##########
@@ -59,6 +61,46 @@ public static boolean selectorNeedsFoldCheck(
            || SerializablePairLongString.class.isAssignableFrom(clazz);
   }
 
+  /**
+   * Returns whether an object *might* contain SerializablePairLongString objects.
+   */
+  public static boolean objectNeedsFoldCheck(Object obj)
+  {
+    if (obj == null) {
+      return false;
+    }
+    final Class<?> clazz = obj.getClass();
+    return clazz.isAssignableFrom(SerializablePairLongString.class)
+           || SerializablePairLongString.class.isAssignableFrom(clazz);
+  }
+
+  /**
+   * Return the object at a particular index from the vector selectors
+   */

Review Comment:
   Important comment for this javadoc - index of bounds issues is the responsibility of the caller



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a diff in pull request #12493: Vectorized version of string last aggregator

Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12493:
URL: https://github.com/apache/druid/pull/12493#discussion_r868099233


##########
processing/src/main/java/org/apache/druid/query/aggregation/last/StringLastVectorAggregator.java:
##########
@@ -69,47 +68,48 @@ public void aggregate(ByteBuffer buf, int position, int startRow, int endRow)
     Object[] objectsWhichMightBeStrings = valueSelector.getObjectVector();
 
     lastTime = buf.getLong(position);
-    int index = endRow - 1;
+    int index;
     for (int i = endRow - 1; i >= startRow; i--) {
-      if (objectsWhichMightBeStrings[i] != null) {
-        index = i;
+      if (times[i] < lastTime && objectsWhichMightBeStrings[i] != null) {
         break;
       }
-    }
-    final boolean foldNeeded = StringFirstLastUtils.objectNeedsFoldCheck(objectsWhichMightBeStrings[index]);
-    if (foldNeeded) {
-      // Less efficient code path when folding is a possibility (we must read the value selector first just in case
-      // it's a foldable object).
-      final SerializablePairLongString inPair = StringFirstLastUtils.readPairFromVectorSelectorsAtIndex(
-          timeSelector,
-          valueSelector,
-          index
-      );
-      if (inPair != null) {
-        final long lastTime = buf.getLong(position);
-        if (inPair.lhs >= lastTime) {
+      index = i;
+      final boolean foldNeeded = StringFirstLastUtils.objectNeedsFoldCheck(objectsWhichMightBeStrings[index]);

Review Comment:
   if `objectsWhichMightBeStrings[index]` is null, then the `objectNeedsFoldCheck(...)` call returns false. even though the array might contain `SerializablePairLongString` objects.
   
   Can we make it so that `objectNeedsFoldCheck` accepts a NonNull object or have it return true if the object is null. If the object is null, I don't think we need to do anything in this function, so I think only checking `foldNeeded` if the object is non null will be more efficient.



##########
processing/src/main/java/org/apache/druid/query/aggregation/last/StringLastVectorAggregator.java:
##########
@@ -123,13 +123,14 @@ public void aggregate(
   {
     long[] timeVector = timeSelector.getLongVector();
     Object[] objectsWhichMightBeStrings = valueSelector.getObjectVector();
+    // one time check on first element to judge if elements are serializable pairs or not
+    final int startRow = rows == null ? 0 : rows[0];
+    final boolean foldNeeded = StringFirstLastUtils.objectNeedsFoldCheck(objectsWhichMightBeStrings[startRow]);

Review Comment:
   Similar comment to line 77



##########
processing/src/main/java/org/apache/druid/query/aggregation/last/StringLastVectorAggregator.java:
##########
@@ -69,47 +68,48 @@ public void aggregate(ByteBuffer buf, int position, int startRow, int endRow)
     Object[] objectsWhichMightBeStrings = valueSelector.getObjectVector();
 
     lastTime = buf.getLong(position);
-    int index = endRow - 1;
+    int index;
     for (int i = endRow - 1; i >= startRow; i--) {
-      if (objectsWhichMightBeStrings[i] != null) {
-        index = i;
+      if (times[i] < lastTime && objectsWhichMightBeStrings[i] != null) {
         break;
       }

Review Comment:
   ```suggestion
         if (times[i] < lastTime) {
           break;
         }
         if (objectsWhichMightBeStrings[i] == null) {
           continue;
         }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #12493: Vectorized version of string last aggregator

Posted by GitBox <gi...@apache.org>.
somu-imply commented on code in PR #12493:
URL: https://github.com/apache/druid/pull/12493#discussion_r868285034


##########
processing/src/main/java/org/apache/druid/query/aggregation/last/StringLastVectorAggregator.java:
##########
@@ -123,13 +123,14 @@ public void aggregate(
   {
     long[] timeVector = timeSelector.getLongVector();
     Object[] objectsWhichMightBeStrings = valueSelector.getObjectVector();
+    // one time check on first element to judge if elements are serializable pairs or not
+    final int startRow = rows == null ? 0 : rows[0];
+    final boolean foldNeeded = StringFirstLastUtils.objectNeedsFoldCheck(objectsWhichMightBeStrings[startRow]);

Review Comment:
   After the change, we iterate over the array of objects to find the first non-null object and call it on that object only. That value is used at the rest of the places. This ensures similar behavior to the other case as above



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s merged pull request #12493: Vectorized version of string last aggregator

Posted by GitBox <gi...@apache.org>.
suneet-s merged PR #12493:
URL: https://github.com/apache/druid/pull/12493


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a diff in pull request #12493: Vectorized version of string last aggregator

Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12493:
URL: https://github.com/apache/druid/pull/12493#discussion_r862948990


##########
processing/src/main/java/org/apache/druid/query/aggregation/last/StringLastVectorAggregator.java:
##########
@@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.last;
+
+import org.apache.druid.java.util.common.DateTimes;
+import org.apache.druid.query.aggregation.SerializablePairLongString;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.query.aggregation.first.StringFirstLastUtils;
+import org.apache.druid.segment.DimensionHandlerUtils;
+import org.apache.druid.segment.vector.BaseLongVectorValueSelector;
+import org.apache.druid.segment.vector.VectorObjectSelector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public class StringLastVectorAggregator implements VectorAggregator
+{
+  private static final SerializablePairLongString INIT = new SerializablePairLongString(
+      DateTimes.MIN.getMillis(),
+      null
+  );
+  private final BaseLongVectorValueSelector timeSelector;
+  private final VectorObjectSelector valueSelector;
+  private final int maxStringBytes;
+  protected long lastTime;
+  protected String lastValue;
+
+  public StringLastVectorAggregator(
+      final BaseLongVectorValueSelector timeSelector,
+      final VectorObjectSelector valueSelector,
+      final int maxStringBytes
+  )
+  {
+    this.timeSelector = timeSelector;
+    this.valueSelector = valueSelector;
+    this.maxStringBytes = maxStringBytes;
+  }
+
+  @Override
+  public void init(ByteBuffer buf, int position)
+  {
+    StringFirstLastUtils.writePair(buf, position, INIT, maxStringBytes);
+  }
+
+  @Override
+  public void aggregate(ByteBuffer buf, int position, int startRow, int endRow)
+  {
+    if (timeSelector == null) {
+      return;
+    }
+    long[] times = timeSelector.getLongVector();
+    Object[] objectsWhichMightBeStrings = valueSelector.getObjectVector();
+
+    lastTime = buf.getLong(position);
+    int index = endRow - 1;
+    for (int i = endRow - 1; i >= startRow; i--) {
+      if (objectsWhichMightBeStrings[i] != null) {
+        index = i;
+        break;
+      }
+    }
+    final boolean foldNeeded = StringFirstLastUtils.objectNeedsFoldCheck(objectsWhichMightBeStrings[index]);
+    if (foldNeeded) {
+      // Less efficient code path when folding is a possibility (we must read the value selector first just in case
+      // it's a foldable object).

Review Comment:
   In this case, you should be reading all the values in the array as long as the times array has an element where times[i] >= lastTime. Eg `lastTime=12345900`
   
   |times|values|
   |12345000|<12345020, NotLast020>|
   |12345000|<12345010, NotLast010>|
   |12345000|<12345100, NotLast100>|
   |12345000|<12345990, Last990>|
   |12345000|<12345120, NotLast120>|
   |12344000|<12344090, NotLast4000>|



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a diff in pull request #12493: Vectorized version of string last aggregator

Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12493:
URL: https://github.com/apache/druid/pull/12493#discussion_r862945708


##########
processing/src/main/java/org/apache/druid/query/aggregation/last/StringLastVectorAggregator.java:
##########
@@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.last;
+
+import org.apache.druid.java.util.common.DateTimes;
+import org.apache.druid.query.aggregation.SerializablePairLongString;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.query.aggregation.first.StringFirstLastUtils;
+import org.apache.druid.segment.DimensionHandlerUtils;
+import org.apache.druid.segment.vector.BaseLongVectorValueSelector;
+import org.apache.druid.segment.vector.VectorObjectSelector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public class StringLastVectorAggregator implements VectorAggregator
+{
+  private static final SerializablePairLongString INIT = new SerializablePairLongString(
+      DateTimes.MIN.getMillis(),
+      null
+  );
+  private final BaseLongVectorValueSelector timeSelector;
+  private final VectorObjectSelector valueSelector;
+  private final int maxStringBytes;
+  protected long lastTime;
+  protected String lastValue;
+
+  public StringLastVectorAggregator(
+      final BaseLongVectorValueSelector timeSelector,
+      final VectorObjectSelector valueSelector,
+      final int maxStringBytes
+  )
+  {
+    this.timeSelector = timeSelector;
+    this.valueSelector = valueSelector;
+    this.maxStringBytes = maxStringBytes;
+  }
+
+  @Override
+  public void init(ByteBuffer buf, int position)
+  {
+    StringFirstLastUtils.writePair(buf, position, INIT, maxStringBytes);
+  }
+
+  @Override
+  public void aggregate(ByteBuffer buf, int position, int startRow, int endRow)
+  {
+    if (timeSelector == null) {
+      return;
+    }
+    long[] times = timeSelector.getLongVector();
+    Object[] objectsWhichMightBeStrings = valueSelector.getObjectVector();
+
+    lastTime = buf.getLong(position);
+    int index = endRow - 1;
+    for (int i = endRow - 1; i >= startRow; i--) {
+      if (objectsWhichMightBeStrings[i] != null) {

Review Comment:
   I think this should be looking for the first non null value which has time >= lastTime.
   
   And as soon as `time[i] < lastTime` you can exit out of the loop early



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #12493: Vectorized version of string last aggregator

Posted by GitBox <gi...@apache.org>.
somu-imply commented on code in PR #12493:
URL: https://github.com/apache/druid/pull/12493#discussion_r867523305


##########
processing/src/main/java/org/apache/druid/query/aggregation/last/StringLastVectorAggregator.java:
##########
@@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.last;
+
+import org.apache.druid.java.util.common.DateTimes;
+import org.apache.druid.query.aggregation.SerializablePairLongString;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.query.aggregation.first.StringFirstLastUtils;
+import org.apache.druid.segment.DimensionHandlerUtils;
+import org.apache.druid.segment.vector.BaseLongVectorValueSelector;
+import org.apache.druid.segment.vector.VectorObjectSelector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public class StringLastVectorAggregator implements VectorAggregator
+{
+  private static final SerializablePairLongString INIT = new SerializablePairLongString(
+      DateTimes.MIN.getMillis(),
+      null
+  );
+  private final BaseLongVectorValueSelector timeSelector;
+  private final VectorObjectSelector valueSelector;
+  private final int maxStringBytes;
+  protected long lastTime;
+  protected String lastValue;
+
+  public StringLastVectorAggregator(
+      final BaseLongVectorValueSelector timeSelector,
+      final VectorObjectSelector valueSelector,
+      final int maxStringBytes
+  )
+  {
+    this.timeSelector = timeSelector;
+    this.valueSelector = valueSelector;
+    this.maxStringBytes = maxStringBytes;
+  }
+
+  @Override
+  public void init(ByteBuffer buf, int position)
+  {
+    StringFirstLastUtils.writePair(buf, position, INIT, maxStringBytes);
+  }
+
+  @Override
+  public void aggregate(ByteBuffer buf, int position, int startRow, int endRow)
+  {
+    if (timeSelector == null) {
+      return;
+    }
+    long[] times = timeSelector.getLongVector();
+    Object[] objectsWhichMightBeStrings = valueSelector.getObjectVector();
+
+    lastTime = buf.getLong(position);
+    int index = endRow - 1;
+    for (int i = endRow - 1; i >= startRow; i--) {
+      if (objectsWhichMightBeStrings[i] != null) {
+        index = i;
+        break;
+      }
+    }
+    final boolean foldNeeded = StringFirstLastUtils.objectNeedsFoldCheck(objectsWhichMightBeStrings[index]);
+    if (foldNeeded) {
+      // Less efficient code path when folding is a possibility (we must read the value selector first just in case
+      // it's a foldable object).
+      final SerializablePairLongString inPair = StringFirstLastUtils.readPairFromVectorSelectorsAtIndex(
+          timeSelector,
+          valueSelector,
+          index
+      );
+      if (inPair != null) {
+        final long lastTime = buf.getLong(position);
+        if (inPair.lhs >= lastTime) {
+          StringFirstLastUtils.writePair(
+              buf,
+              position,
+              new SerializablePairLongString(inPair.lhs, inPair.rhs),
+              maxStringBytes
+          );
+        }
+      }
+    } else {
+      final long time = times[index];
+
+      if (time >= lastTime) {
+        final String value = DimensionHandlerUtils.convertObjectToString(objectsWhichMightBeStrings[index]);
+        lastTime = time;
+        StringFirstLastUtils.writePair(
+            buf,
+            position,
+            new SerializablePairLongString(time, value),
+            maxStringBytes
+        );
+      }
+    }
+  }
+
+  @Override
+  public void aggregate(
+      ByteBuffer buf,
+      int numRows,
+      int[] positions,
+      @Nullable int[] rows,
+      int positionOffset
+  )
+  {
+    long[] timeVector = timeSelector.getLongVector();
+    Object[] objectsWhichMightBeStrings = valueSelector.getObjectVector();
+    for (int i = 0; i < numRows; i++) {
+      int position = positions[i] + positionOffset;
+      int row = rows == null ? i : rows[i];
+      long lastTime = buf.getLong(position);
+      if (timeVector[row] >= lastTime) {
+        //check if needs fold check or not
+        final boolean foldNeeded = StringFirstLastUtils.objectNeedsFoldCheck(objectsWhichMightBeStrings[row]);

Review Comment:
   Moved to one time check with the 0th element outside the loop



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] somu-imply commented on a diff in pull request #12493: Vectorized version of string last aggregator

Posted by GitBox <gi...@apache.org>.
somu-imply commented on code in PR #12493:
URL: https://github.com/apache/druid/pull/12493#discussion_r867523376


##########
processing/src/main/java/org/apache/druid/query/aggregation/last/StringLastVectorAggregator.java:
##########
@@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation.last;
+
+import org.apache.druid.java.util.common.DateTimes;
+import org.apache.druid.query.aggregation.SerializablePairLongString;
+import org.apache.druid.query.aggregation.VectorAggregator;
+import org.apache.druid.query.aggregation.first.StringFirstLastUtils;
+import org.apache.druid.segment.DimensionHandlerUtils;
+import org.apache.druid.segment.vector.BaseLongVectorValueSelector;
+import org.apache.druid.segment.vector.VectorObjectSelector;
+
+import javax.annotation.Nullable;
+import java.nio.ByteBuffer;
+
+public class StringLastVectorAggregator implements VectorAggregator
+{
+  private static final SerializablePairLongString INIT = new SerializablePairLongString(
+      DateTimes.MIN.getMillis(),
+      null
+  );
+  private final BaseLongVectorValueSelector timeSelector;
+  private final VectorObjectSelector valueSelector;
+  private final int maxStringBytes;
+  protected long lastTime;
+  protected String lastValue;
+
+  public StringLastVectorAggregator(
+      final BaseLongVectorValueSelector timeSelector,
+      final VectorObjectSelector valueSelector,
+      final int maxStringBytes
+  )
+  {
+    this.timeSelector = timeSelector;
+    this.valueSelector = valueSelector;
+    this.maxStringBytes = maxStringBytes;
+  }
+
+  @Override
+  public void init(ByteBuffer buf, int position)
+  {
+    StringFirstLastUtils.writePair(buf, position, INIT, maxStringBytes);
+  }
+
+  @Override
+  public void aggregate(ByteBuffer buf, int position, int startRow, int endRow)
+  {
+    if (timeSelector == null) {
+      return;
+    }
+    long[] times = timeSelector.getLongVector();
+    Object[] objectsWhichMightBeStrings = valueSelector.getObjectVector();
+
+    lastTime = buf.getLong(position);
+    int index = endRow - 1;
+    for (int i = endRow - 1; i >= startRow; i--) {
+      if (objectsWhichMightBeStrings[i] != null) {

Review Comment:
   Updated now we check if time is less than lastTime and object is not null



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org