You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/03/18 12:24:52 UTC

[GitHub] [druid] clintropolis opened a new pull request #11010: vector group by support for string expressions

clintropolis opened a new pull request #11010:
URL: https://github.com/apache/druid/pull/11010


   ### Description
   Expands on the structures added in #10613 to add support for grouping on string expressions in the vectorized group by engine. The key addition that makes this possible is `DictionaryBuildingSingleValueStringGroupByVectorColumnSelector`, which is the vectorized group by engine version of `DictionaryBuildingStringGroupByColumnSelectorStrategy`, and allows the vector group by engine to group on strings which are not dictionary encoded.
   
   To help showcase this, I added vectorization support to the concat operator `string1 + 'foo'`, and the concat function `concat(string1,'-',string2,'-',long1)`.
   
   It provides a pretty decent performance increase. From the added benchmark queries:
   
   ```
         // 26: group by string expr with non-expr agg
         "SELECT CONCAT(string2, '-', long2), SUM(double1) FROM foo GROUP BY 1 ORDER BY 2",
         // 27: group by string expr with expr agg
         "SELECT CONCAT(string2, '-', long2), SUM(long1 * double4) FROM foo GROUP BY 1 ORDER BY 2"
   ```
   
   ```
   Benchmark                        (query)  (rowsPerSegment)  (vectorize)  Mode  Cnt     Score    Error  Units
   SqlExpressionBenchmark.querySql       26           5000000        false  avgt    5  1601.424 ± 22.075  ms/op
   SqlExpressionBenchmark.querySql       26           5000000        force  avgt    5  1017.797 ± 18.384  ms/op
   SqlExpressionBenchmark.querySql       27           5000000        false  avgt    5  2072.850 ± 46.369  ms/op
   SqlExpressionBenchmark.querySql       27           5000000        force  avgt    5  1072.897 ± 19.756  ms/op
   ```
   
   Vectorizing additional string expressions I will save for a future PR.
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `DictionaryBuildingSingleValueStringGroupByVectorColumnSelector`
    * `VectorGroupByEngine`
    * `GroupByVectorColumnProcessorFactory`
    * `VectorStringProcessors`
    * `StringOutMultiStringInVectorProcessor`
   
   <hr>
   
   This PR has:
   - [ ] been self-reviewed.
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] been tested in a test Druid cluster.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #11010: vector group by support for string expressions

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #11010:
URL: https://github.com/apache/druid/pull/11010#discussion_r608347777



##########
File path: core/src/main/java/org/apache/druid/math/expr/vector/VectorStringProcessors.java
##########
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.math.expr.vector;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprType;
+
+import javax.annotation.Nullable;
+import java.util.List;
+
+public class VectorStringProcessors
+{
+  public static <T> ExprVectorProcessor<T> concat(Expr.VectorInputBindingInspector inspector, Expr left, Expr right)
+  {
+    final ExprVectorProcessor processor;
+    if (NullHandling.sqlCompatible()) {
+      processor = new StringOutStringsInFunctionVectorProcessor(
+          left.buildVectorized(inspector),
+          right.buildVectorized(inspector),
+          inspector.getMaxVectorSize()
+      )
+      {
+        @Nullable
+        @Override
+        protected String processValue(@Nullable String leftVal, @Nullable String rightVal)
+        {
+          return leftVal + rightVal;

Review comment:
       Maybe some comment about why it does not handle nulls unlike the other `concat` method below?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on pull request #11010: vector group by support for string expressions

Posted by GitBox <gi...@apache.org>.
clintropolis commented on pull request #11010:
URL: https://github.com/apache/druid/pull/11010#issuecomment-816354811


   thanks for review @jihoonson 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis merged pull request #11010: vector group by support for string expressions

Posted by GitBox <gi...@apache.org>.
clintropolis merged pull request #11010:
URL: https://github.com/apache/druid/pull/11010


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org