You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/09/27 10:45:24 UTC

[GitHub] [druid] clintropolis opened a new pull request #10441: vectorized group by support for numeric null columns

clintropolis opened a new pull request #10441:
URL: https://github.com/apache/druid/pull/10441


   ### Description
   This PR adds support for the vectorized group by engine to group by numeric columns with null values, using a similar approach to the non-vectorized engine of storing an extra byte in the key to hold information about whether the value is null or not. I modified the signature of the methods on `VectorColumnProcessorFactory` to include `ColumnCapabilities` for the selector, so that `GroupByVectorColumnSelectorFactory` methods could check `capabilities.hasNulls().isFalse()` to continue to use the existing processors, and only use the newly added null handling processors when there are actually null values to deal with.
   
   In the process of adding tests for this, I also uncovered another issue with the `SpillingGrouper` when grouping on double values, where sometimes when making the round trip through serde they end up as floats, resulting in a cast exception when run through the comparator. To fix this issue I modified the comparators in `RowBasedGrouperHelper` to check for double typed keys and coerce both values to be doubles. A more proper fix would be to rework the SpillingGrouper completely so that the key serde just reads/writes directly instead of going through json generator, but that was too much for this PR, so I shall leave that for follow-up work.
   
   <hr>
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #10441: vectorized group by support for nullable numeric columns

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #10441:
URL: https://github.com/apache/druid/pull/10441#discussion_r499911525



##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/vector/GroupByVectorColumnProcessorFactory.java
##########
@@ -39,32 +42,58 @@ public static GroupByVectorColumnProcessorFactory instance()
   }
 
   @Override
-  public GroupByVectorColumnSelector makeSingleValueDimensionProcessor(final SingleValueDimensionVectorSelector selector)
+  public GroupByVectorColumnSelector makeSingleValueDimensionProcessor(
+      final ColumnCapabilities capabilities,
+      final SingleValueDimensionVectorSelector selector
+  )
   {
+    Preconditions.checkArgument(ValueType.STRING == capabilities.getType());

Review comment:
       nit: an error message would help debugging if it really happened.

##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/vector/NullableDoubleGroupByVectorColumnSelector.java
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.groupby.epinephelinae.vector;
+
+import org.apache.datasketches.memory.Memory;
+import org.apache.datasketches.memory.WritableMemory;
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.groupby.ResultRow;
+import org.apache.druid.segment.vector.VectorValueSelector;
+
+public class NullableDoubleGroupByVectorColumnSelector implements GroupByVectorColumnSelector
+{
+  private final VectorValueSelector selector;
+
+  NullableDoubleGroupByVectorColumnSelector(final VectorValueSelector selector)
+  {
+    this.selector = selector;
+  }
+
+  @Override
+  public int getGroupingKeySize()
+  {
+    return 1 + Double.BYTES;

Review comment:
       nit: `Byte.BYTES`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #10441: vectorized group by support for nullable numeric columns

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #10441:
URL: https://github.com/apache/druid/pull/10441#discussion_r499927921



##########
File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/vector/NullableDoubleGroupByVectorColumnSelector.java
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.groupby.epinephelinae.vector;
+
+import org.apache.datasketches.memory.Memory;
+import org.apache.datasketches.memory.WritableMemory;
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.query.groupby.ResultRow;
+import org.apache.druid.segment.vector.VectorValueSelector;
+
+public class NullableDoubleGroupByVectorColumnSelector implements GroupByVectorColumnSelector
+{
+  private final VectorValueSelector selector;
+
+  NullableDoubleGroupByVectorColumnSelector(final VectorValueSelector selector)
+  {
+    this.selector = selector;
+  }
+
+  @Override
+  public int getGroupingKeySize()
+  {
+    return 1 + Double.BYTES;

Review comment:
       i hope `Byte.BYTES` is always 1, I wouldn’t know what to believe anymore :sweat_smile: 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis merged pull request #10441: vectorized group by support for nullable numeric columns

Posted by GitBox <gi...@apache.org>.
clintropolis merged pull request #10441:
URL: https://github.com/apache/druid/pull/10441


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on pull request #10441: vectorized group by support for nullable numeric columns

Posted by GitBox <gi...@apache.org>.
clintropolis commented on pull request #10441:
URL: https://github.com/apache/druid/pull/10441#issuecomment-699826391


   This PR will fail coverage checks in default mode (`druid.generic.useDefaultValueForNull=true`) because some of the added code is only applicable in SQL compatible null handling mode.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org