You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/08/26 04:53:58 UTC

[GitHub] [incubator-pinot] Jackie-Jiang opened a new pull request #5926: Introduce IdSet and add IdSetAggregationFunction

Jackie-Jiang opened a new pull request #5926:
URL: https://github.com/apache/incubator-pinot/pull/5926


   ## Description
   For issue #5925 
   Introduce `IdSet` which represents a collection of ids (currently support INT and LONG type) and can be used to replace the large IN clause to optimize the query. It is based on `RoaringBitmap` for INT type, and `Roaring64NavigableMap` for LONG type.
   Add `IdSetAggregationFunction` to create `IdSet` for a column.
   Bump up the version of `RoaringBitmap` to `0.9.0` to include some bug fixes for `Roaring64NavigableMap`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-commenter commented on pull request #5926: Introduce IdSet and add IdSetAggregationFunction

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #5926:
URL: https://github.com/apache/incubator-pinot/pull/5926#issuecomment-682301837


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=h1) Report
   > Merging [#5926](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/1beaab59b73f26c4e35f3b9bc856b03806cddf5a?el=desc) will **decrease** coverage by `22.77%`.
   > The diff coverage is `48.87%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/5926/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master    #5926       +/-   ##
   ===========================================
   - Coverage   66.44%   43.67%   -22.78%     
   ===========================================
     Files        1075     1191      +116     
     Lines       54773    62438     +7665     
     Branches     8168     9533     +1365     
   ===========================================
   - Hits        36396    27269     -9127     
   - Misses      15700    32716    +17016     
   + Partials     2677     2453      -224     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | #integration | `43.67% <48.87%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...ot/broker/broker/AllowAllAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0FsbG93QWxsQWNjZXNzQ29udHJvbEZhY3RvcnkuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [.../helix/BrokerUserDefinedMessageHandlerFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL2hlbGl4L0Jyb2tlclVzZXJEZWZpbmVkTWVzc2FnZUhhbmRsZXJGYWN0b3J5LmphdmE=) | `52.83% <0.00%> (-13.84%)` | :arrow_down: |
   | [...ava/org/apache/pinot/client/AbstractResultSet.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Fic3RyYWN0UmVzdWx0U2V0LmphdmE=) | `26.66% <0.00%> (-30.48%)` | :arrow_down: |
   | [.../main/java/org/apache/pinot/client/Connection.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Nvbm5lY3Rpb24uamF2YQ==) | `22.22% <0.00%> (-26.62%)` | :arrow_down: |
   | [.../org/apache/pinot/client/ResultTableResultSet.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L1Jlc3VsdFRhYmxlUmVzdWx0U2V0LmphdmE=) | `24.00% <0.00%> (-10.29%)` | :arrow_down: |
   | [.../org/apache/pinot/common/lineage/LineageEntry.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbGluZWFnZS9MaW5lYWdlRW50cnkuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...apache/pinot/common/lineage/LineageEntryState.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbGluZWFnZS9MaW5lYWdlRW50cnlTdGF0ZS5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...rg/apache/pinot/common/lineage/SegmentLineage.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbGluZWFnZS9TZWdtZW50TGluZWFnZS5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...ache/pinot/common/lineage/SegmentLineageUtils.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbGluZWFnZS9TZWdtZW50TGluZWFnZVV0aWxzLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...ot/common/messages/RoutingTableRebuildMessage.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWVzc2FnZXMvUm91dGluZ1RhYmxlUmVidWlsZE1lc3NhZ2UuamF2YQ==) | `0.00% <ø> (ø)` | |
   | ... and [1107 more](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=footer). Last update [e4ea9fc...fab464f](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-commenter edited a comment on pull request #5926: Introduce IdSet and add IdSetAggregationFunction

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #5926:
URL: https://github.com/apache/incubator-pinot/pull/5926#issuecomment-682301837


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=h1) Report
   > Merging [#5926](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/1beaab59b73f26c4e35f3b9bc856b03806cddf5a?el=desc) will **increase** coverage by `0.90%`.
   > The diff coverage is `68.99%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/5926/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #5926      +/-   ##
   ==========================================
   + Coverage   66.44%   67.35%   +0.90%     
   ==========================================
     Files        1075     1191     +116     
     Lines       54773    62438    +7665     
     Branches     8168     9533    +1365     
   ==========================================
   + Hits        36396    42054    +5658     
   - Misses      15700    17286    +1586     
   - Partials     2677     3098     +421     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | #integration | `43.75% <48.87%> (?)` | |
   | #unittests | `58.79% <56.33%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...ot/broker/broker/AllowAllAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0FsbG93QWxsQWNjZXNzQ29udHJvbEZhY3RvcnkuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [.../helix/BrokerUserDefinedMessageHandlerFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL2hlbGl4L0Jyb2tlclVzZXJEZWZpbmVkTWVzc2FnZUhhbmRsZXJGYWN0b3J5LmphdmE=) | `52.83% <0.00%> (-13.84%)` | :arrow_down: |
   | [...ava/org/apache/pinot/client/AbstractResultSet.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Fic3RyYWN0UmVzdWx0U2V0LmphdmE=) | `53.33% <0.00%> (-3.81%)` | :arrow_down: |
   | [.../main/java/org/apache/pinot/client/Connection.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Nvbm5lY3Rpb24uamF2YQ==) | `44.44% <0.00%> (-4.40%)` | :arrow_down: |
   | [.../org/apache/pinot/client/ResultTableResultSet.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L1Jlc3VsdFRhYmxlUmVzdWx0U2V0LmphdmE=) | `24.00% <0.00%> (-10.29%)` | :arrow_down: |
   | [...ot/common/messages/RoutingTableRebuildMessage.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWVzc2FnZXMvUm91dGluZ1RhYmxlUmVidWlsZE1lc3NhZ2UuamF2YQ==) | `54.54% <ø> (ø)` | |
   | [...ache/pinot/common/metadata/ZKMetadataProvider.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0YWRhdGEvWktNZXRhZGF0YVByb3ZpZGVyLmphdmE=) | `70.55% <ø> (+3.70%)` | :arrow_up: |
   | [...metadata/segment/LLCRealtimeSegmentZKMetadata.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0YWRhdGEvc2VnbWVudC9MTENSZWFsdGltZVNlZ21lbnRaS01ldGFkYXRhLmphdmE=) | `47.61% <ø> (+3.35%)` | :arrow_up: |
   | [...g/apache/pinot/common/metrics/AbstractMetrics.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9BYnN0cmFjdE1ldHJpY3MuamF2YQ==) | `82.57% <ø> (+7.90%)` | :arrow_up: |
   | [...a/org/apache/pinot/common/metrics/BrokerGauge.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9Ccm9rZXJHYXVnZS5qYXZh) | `91.66% <ø> (+1.66%)` | :arrow_up: |
   | ... and [860 more](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=footer). Last update [e4ea9fc...fab464f](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5926: Introduce IdSet and add IdSetAggregationFunction

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5926:
URL: https://github.com/apache/incubator-pinot/pull/5926#discussion_r479457266



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/IdSetAggregationFunction.java
##########
@@ -0,0 +1,339 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.query.aggregation.function;
+
+import com.google.common.base.Preconditions;
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.pinot.common.function.AggregationFunctionType;
+import org.apache.pinot.common.utils.DataSchema.ColumnDataType;
+import org.apache.pinot.core.common.BlockValSet;
+import org.apache.pinot.core.query.aggregation.AggregationResultHolder;
+import org.apache.pinot.core.query.aggregation.ObjectAggregationResultHolder;
+import org.apache.pinot.core.query.aggregation.groupby.GroupByResultHolder;
+import org.apache.pinot.core.query.aggregation.groupby.ObjectGroupByResultHolder;
+import org.apache.pinot.core.query.request.context.ExpressionContext;
+import org.apache.pinot.core.query.utils.idset.IdSet;
+import org.apache.pinot.core.query.utils.idset.IdSets;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * The {@code IdSetAggregationFunction} collects the values for a given single-value expression into an IdSet, which can
+ * be used in the second query to optimize the query with huge IN clause generated from another query.
+ */
+public class IdSetAggregationFunction extends BaseSingleInputAggregationFunction<IdSet, String> {
+  public static final char PARAMETER_DELIMITER = ';';
+  public static final char PARAMETER_KEY_VALUE_SEPARATOR = '=';
+  public static final String SIZE_THRESHOLD_IN_BYTES_KEY = "SIZETHRESHOLDINBYTES";
+  public static final String EXPECTED_INSERTIONS_KEY = "EXPECTEDINSERTIONS";
+  public static final String FPP_KEY = "FPP";
+
+  private final int _sizeThresholdInBytes;
+  private final int _expectedInsertions;
+  private final double _fpp;
+
+  /**

Review comment:
       Added

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/IdSetAggregationFunction.java
##########
@@ -0,0 +1,339 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.query.aggregation.function;
+
+import com.google.common.base.Preconditions;
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.pinot.common.function.AggregationFunctionType;
+import org.apache.pinot.common.utils.DataSchema.ColumnDataType;
+import org.apache.pinot.core.common.BlockValSet;
+import org.apache.pinot.core.query.aggregation.AggregationResultHolder;
+import org.apache.pinot.core.query.aggregation.ObjectAggregationResultHolder;
+import org.apache.pinot.core.query.aggregation.groupby.GroupByResultHolder;
+import org.apache.pinot.core.query.aggregation.groupby.ObjectGroupByResultHolder;
+import org.apache.pinot.core.query.request.context.ExpressionContext;
+import org.apache.pinot.core.query.utils.idset.IdSet;
+import org.apache.pinot.core.query.utils.idset.IdSets;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * The {@code IdSetAggregationFunction} collects the values for a given single-value expression into an IdSet, which can
+ * be used in the second query to optimize the query with huge IN clause generated from another query.
+ */
+public class IdSetAggregationFunction extends BaseSingleInputAggregationFunction<IdSet, String> {
+  public static final char PARAMETER_DELIMITER = ';';
+  public static final char PARAMETER_KEY_VALUE_SEPARATOR = '=';
+  public static final String SIZE_THRESHOLD_IN_BYTES_KEY = "SIZETHRESHOLDINBYTES";

Review comment:
       Added




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on pull request #5926: Introduce IdSet and add IdSetAggregationFunction

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on pull request #5926:
URL: https://github.com/apache/incubator-pinot/pull/5926#issuecomment-682294392


   @kishoreg Refactored the PR and introduced the `BloomFilterIdSet`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on a change in pull request #5926: Introduce IdSet and add IdSetAggregationFunction

Posted by GitBox <gi...@apache.org>.
kishoreg commented on a change in pull request #5926:
URL: https://github.com/apache/incubator-pinot/pull/5926#discussion_r478837312



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/IdSetAggregationFunction.java
##########
@@ -0,0 +1,339 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.query.aggregation.function;
+
+import com.google.common.base.Preconditions;
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.pinot.common.function.AggregationFunctionType;
+import org.apache.pinot.common.utils.DataSchema.ColumnDataType;
+import org.apache.pinot.core.common.BlockValSet;
+import org.apache.pinot.core.query.aggregation.AggregationResultHolder;
+import org.apache.pinot.core.query.aggregation.ObjectAggregationResultHolder;
+import org.apache.pinot.core.query.aggregation.groupby.GroupByResultHolder;
+import org.apache.pinot.core.query.aggregation.groupby.ObjectGroupByResultHolder;
+import org.apache.pinot.core.query.request.context.ExpressionContext;
+import org.apache.pinot.core.query.utils.idset.IdSet;
+import org.apache.pinot.core.query.utils.idset.IdSets;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * The {@code IdSetAggregationFunction} collects the values for a given single-value expression into an IdSet, which can
+ * be used in the second query to optimize the query with huge IN clause generated from another query.
+ */
+public class IdSetAggregationFunction extends BaseSingleInputAggregationFunction<IdSet, String> {
+  public static final char PARAMETER_DELIMITER = ';';
+  public static final char PARAMETER_KEY_VALUE_SEPARATOR = '=';
+  public static final String SIZE_THRESHOLD_IN_BYTES_KEY = "SIZETHRESHOLDINBYTES";

Review comment:
       what are these? java docs

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/IdSetAggregationFunction.java
##########
@@ -0,0 +1,339 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.query.aggregation.function;
+
+import com.google.common.base.Preconditions;
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.pinot.common.function.AggregationFunctionType;
+import org.apache.pinot.common.utils.DataSchema.ColumnDataType;
+import org.apache.pinot.core.common.BlockValSet;
+import org.apache.pinot.core.query.aggregation.AggregationResultHolder;
+import org.apache.pinot.core.query.aggregation.ObjectAggregationResultHolder;
+import org.apache.pinot.core.query.aggregation.groupby.GroupByResultHolder;
+import org.apache.pinot.core.query.aggregation.groupby.ObjectGroupByResultHolder;
+import org.apache.pinot.core.query.request.context.ExpressionContext;
+import org.apache.pinot.core.query.utils.idset.IdSet;
+import org.apache.pinot.core.query.utils.idset.IdSets;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * The {@code IdSetAggregationFunction} collects the values for a given single-value expression into an IdSet, which can
+ * be used in the second query to optimize the query with huge IN clause generated from another query.
+ */
+public class IdSetAggregationFunction extends BaseSingleInputAggregationFunction<IdSet, String> {
+  public static final char PARAMETER_DELIMITER = ';';
+  public static final char PARAMETER_KEY_VALUE_SEPARATOR = '=';
+  public static final String SIZE_THRESHOLD_IN_BYTES_KEY = "SIZETHRESHOLDINBYTES";
+  public static final String EXPECTED_INSERTIONS_KEY = "EXPECTEDINSERTIONS";
+  public static final String FPP_KEY = "FPP";
+
+  private final int _sizeThresholdInBytes;
+  private final int _expectedInsertions;
+  private final double _fpp;
+
+  /**

Review comment:
       add a sample call




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on a change in pull request #5926: Introduce IdSet and add IdSetAggregationFunction

Posted by GitBox <gi...@apache.org>.
kishoreg commented on a change in pull request #5926:
URL: https://github.com/apache/incubator-pinot/pull/5926#discussion_r477036505



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/utils/IdSet.java
##########
@@ -0,0 +1,326 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.query.utils;
+
+import com.google.common.base.Preconditions;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Base64;
+import java.util.Objects;
+import javax.annotation.Nullable;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+import org.roaringbitmap.RoaringBitmap;
+import org.roaringbitmap.longlong.Roaring64NavigableMap;
+
+
+/**
+ * The {@code IdSet} represents a collection of ids. It can be used to optimize the query with huge IN clause.
+ */
+public class IdSet implements Comparable<IdSet> {
+  public static final IdSet EMPTY_ID_SET = new IdSet(Type.EMPTY, null, null);
+
+  // Throw exception when the serialized IdSet is exceeding this threshold (32MB)
+  private static final int MAX_SIZE_IN_BYTES = 32 * 1024 * 1024;
+
+  private enum Type {
+    // DO NOT change the index of the types as the ser/de relies on them
+    EMPTY(0), ROARING_BITMAP(1), ROARING_64_NAVIGABLE_MAP(2);
+
+    private final int _index;
+
+    Type(int index) {
+      _index = index;
+    }
+  }
+
+  private final Type _type;
+  private final RoaringBitmap _bitmap;
+  private final Roaring64NavigableMap _longBitmap;
+

Review comment:
       Do you think this constructor is the only thing we need? If not, We should make this private and add static builders

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/utils/IdSet.java
##########
@@ -0,0 +1,326 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.query.utils;
+
+import com.google.common.base.Preconditions;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Base64;
+import java.util.Objects;
+import javax.annotation.Nullable;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+import org.roaringbitmap.RoaringBitmap;
+import org.roaringbitmap.longlong.Roaring64NavigableMap;
+
+
+/**
+ * The {@code IdSet} represents a collection of ids. It can be used to optimize the query with huge IN clause.
+ */
+public class IdSet implements Comparable<IdSet> {

Review comment:
       Let’s make this an interface with multiple implementations?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv removed a comment on pull request #5926: Introduce IdSet and add IdSetAggregationFunction

Posted by GitBox <gi...@apache.org>.
mayankshriv removed a comment on pull request #5926:
URL: https://github.com/apache/incubator-pinot/pull/5926#issuecomment-744633367


   @Jackie-Jiang Any chance you can break the PR to have the thrift side changes separated from actual code changes? Or do they go hand in hand? Might be easier to review the PR if broken.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang merged pull request #5926: Introduce IdSet and add IdSetAggregationFunction

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang merged pull request #5926:
URL: https://github.com/apache/incubator-pinot/pull/5926


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on pull request #5926: Introduce IdSet and add IdSetAggregationFunction

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on pull request #5926:
URL: https://github.com/apache/incubator-pinot/pull/5926#issuecomment-744633367


   @Jackie-Jiang Any chance you can break the PR to have the thrift side changes separated from actual code changes? Or do they go hand in hand? Might be easier to review the PR if broken.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org