You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/08/26 04:53:58 UTC
[GitHub] [incubator-pinot] Jackie-Jiang opened a new pull request #5926: Introduce IdSet and add IdSetAggregationFunction
Jackie-Jiang opened a new pull request #5926:
URL: https://github.com/apache/incubator-pinot/pull/5926
## Description
For issue #5925
Introduce `IdSet` which represents a collection of ids (currently support INT and LONG type) and can be used to replace the large IN clause to optimize the query. It is based on `RoaringBitmap` for INT type, and `Roaring64NavigableMap` for LONG type.
Add `IdSetAggregationFunction` to create `IdSet` for a column.
Bump up the version of `RoaringBitmap` to `0.9.0` to include some bug fixes for `Roaring64NavigableMap`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] codecov-commenter commented on pull request #5926: Introduce IdSet and add IdSetAggregationFunction
Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #5926:
URL: https://github.com/apache/incubator-pinot/pull/5926#issuecomment-682301837
# [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=h1) Report
> Merging [#5926](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/1beaab59b73f26c4e35f3b9bc856b03806cddf5a?el=desc) will **decrease** coverage by `22.77%`.
> The diff coverage is `48.87%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/5926/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #5926 +/- ##
===========================================
- Coverage 66.44% 43.67% -22.78%
===========================================
Files 1075 1191 +116
Lines 54773 62438 +7665
Branches 8168 9533 +1365
===========================================
- Hits 36396 27269 -9127
- Misses 15700 32716 +17016
+ Partials 2677 2453 -224
```
| Flag | Coverage Δ | |
|---|---|---|
| #integration | `43.67% <48.87%> (?)` | |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [...ot/broker/broker/AllowAllAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0FsbG93QWxsQWNjZXNzQ29udHJvbEZhY3RvcnkuamF2YQ==) | `100.00% <ø> (ø)` | |
| [.../helix/BrokerUserDefinedMessageHandlerFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL2hlbGl4L0Jyb2tlclVzZXJEZWZpbmVkTWVzc2FnZUhhbmRsZXJGYWN0b3J5LmphdmE=) | `52.83% <0.00%> (-13.84%)` | :arrow_down: |
| [...ava/org/apache/pinot/client/AbstractResultSet.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Fic3RyYWN0UmVzdWx0U2V0LmphdmE=) | `26.66% <0.00%> (-30.48%)` | :arrow_down: |
| [.../main/java/org/apache/pinot/client/Connection.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Nvbm5lY3Rpb24uamF2YQ==) | `22.22% <0.00%> (-26.62%)` | :arrow_down: |
| [.../org/apache/pinot/client/ResultTableResultSet.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L1Jlc3VsdFRhYmxlUmVzdWx0U2V0LmphdmE=) | `24.00% <0.00%> (-10.29%)` | :arrow_down: |
| [.../org/apache/pinot/common/lineage/LineageEntry.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbGluZWFnZS9MaW5lYWdlRW50cnkuamF2YQ==) | `0.00% <0.00%> (ø)` | |
| [...apache/pinot/common/lineage/LineageEntryState.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbGluZWFnZS9MaW5lYWdlRW50cnlTdGF0ZS5qYXZh) | `0.00% <0.00%> (ø)` | |
| [...rg/apache/pinot/common/lineage/SegmentLineage.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbGluZWFnZS9TZWdtZW50TGluZWFnZS5qYXZh) | `0.00% <0.00%> (ø)` | |
| [...ache/pinot/common/lineage/SegmentLineageUtils.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbGluZWFnZS9TZWdtZW50TGluZWFnZVV0aWxzLmphdmE=) | `0.00% <0.00%> (ø)` | |
| [...ot/common/messages/RoutingTableRebuildMessage.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWVzc2FnZXMvUm91dGluZ1RhYmxlUmVidWlsZE1lc3NhZ2UuamF2YQ==) | `0.00% <ø> (ø)` | |
| ... and [1107 more](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree-more) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=footer). Last update [e4ea9fc...fab464f](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] codecov-commenter edited a comment on pull request #5926: Introduce IdSet and add IdSetAggregationFunction
Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #5926:
URL: https://github.com/apache/incubator-pinot/pull/5926#issuecomment-682301837
# [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=h1) Report
> Merging [#5926](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/1beaab59b73f26c4e35f3b9bc856b03806cddf5a?el=desc) will **increase** coverage by `0.90%`.
> The diff coverage is `68.99%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/5926/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #5926 +/- ##
==========================================
+ Coverage 66.44% 67.35% +0.90%
==========================================
Files 1075 1191 +116
Lines 54773 62438 +7665
Branches 8168 9533 +1365
==========================================
+ Hits 36396 42054 +5658
- Misses 15700 17286 +1586
- Partials 2677 3098 +421
```
| Flag | Coverage Δ | |
|---|---|---|
| #integration | `43.75% <48.87%> (?)` | |
| #unittests | `58.79% <56.33%> (?)` | |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [...ot/broker/broker/AllowAllAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0FsbG93QWxsQWNjZXNzQ29udHJvbEZhY3RvcnkuamF2YQ==) | `100.00% <ø> (ø)` | |
| [.../helix/BrokerUserDefinedMessageHandlerFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL2hlbGl4L0Jyb2tlclVzZXJEZWZpbmVkTWVzc2FnZUhhbmRsZXJGYWN0b3J5LmphdmE=) | `52.83% <0.00%> (-13.84%)` | :arrow_down: |
| [...ava/org/apache/pinot/client/AbstractResultSet.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Fic3RyYWN0UmVzdWx0U2V0LmphdmE=) | `53.33% <0.00%> (-3.81%)` | :arrow_down: |
| [.../main/java/org/apache/pinot/client/Connection.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Nvbm5lY3Rpb24uamF2YQ==) | `44.44% <0.00%> (-4.40%)` | :arrow_down: |
| [.../org/apache/pinot/client/ResultTableResultSet.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L1Jlc3VsdFRhYmxlUmVzdWx0U2V0LmphdmE=) | `24.00% <0.00%> (-10.29%)` | :arrow_down: |
| [...ot/common/messages/RoutingTableRebuildMessage.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWVzc2FnZXMvUm91dGluZ1RhYmxlUmVidWlsZE1lc3NhZ2UuamF2YQ==) | `54.54% <ø> (ø)` | |
| [...ache/pinot/common/metadata/ZKMetadataProvider.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0YWRhdGEvWktNZXRhZGF0YVByb3ZpZGVyLmphdmE=) | `70.55% <ø> (+3.70%)` | :arrow_up: |
| [...metadata/segment/LLCRealtimeSegmentZKMetadata.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0YWRhdGEvc2VnbWVudC9MTENSZWFsdGltZVNlZ21lbnRaS01ldGFkYXRhLmphdmE=) | `47.61% <ø> (+3.35%)` | :arrow_up: |
| [...g/apache/pinot/common/metrics/AbstractMetrics.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9BYnN0cmFjdE1ldHJpY3MuamF2YQ==) | `82.57% <ø> (+7.90%)` | :arrow_up: |
| [...a/org/apache/pinot/common/metrics/BrokerGauge.java](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9Ccm9rZXJHYXVnZS5qYXZh) | `91.66% <ø> (+1.66%)` | :arrow_up: |
| ... and [860 more](https://codecov.io/gh/apache/incubator-pinot/pull/5926/diff?src=pr&el=tree-more) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=footer). Last update [e4ea9fc...fab464f](https://codecov.io/gh/apache/incubator-pinot/pull/5926?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5926: Introduce IdSet and add IdSetAggregationFunction
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5926:
URL: https://github.com/apache/incubator-pinot/pull/5926#discussion_r479457266
##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/IdSetAggregationFunction.java
##########
@@ -0,0 +1,339 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.query.aggregation.function;
+
+import com.google.common.base.Preconditions;
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.pinot.common.function.AggregationFunctionType;
+import org.apache.pinot.common.utils.DataSchema.ColumnDataType;
+import org.apache.pinot.core.common.BlockValSet;
+import org.apache.pinot.core.query.aggregation.AggregationResultHolder;
+import org.apache.pinot.core.query.aggregation.ObjectAggregationResultHolder;
+import org.apache.pinot.core.query.aggregation.groupby.GroupByResultHolder;
+import org.apache.pinot.core.query.aggregation.groupby.ObjectGroupByResultHolder;
+import org.apache.pinot.core.query.request.context.ExpressionContext;
+import org.apache.pinot.core.query.utils.idset.IdSet;
+import org.apache.pinot.core.query.utils.idset.IdSets;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * The {@code IdSetAggregationFunction} collects the values for a given single-value expression into an IdSet, which can
+ * be used in the second query to optimize the query with huge IN clause generated from another query.
+ */
+public class IdSetAggregationFunction extends BaseSingleInputAggregationFunction<IdSet, String> {
+ public static final char PARAMETER_DELIMITER = ';';
+ public static final char PARAMETER_KEY_VALUE_SEPARATOR = '=';
+ public static final String SIZE_THRESHOLD_IN_BYTES_KEY = "SIZETHRESHOLDINBYTES";
+ public static final String EXPECTED_INSERTIONS_KEY = "EXPECTEDINSERTIONS";
+ public static final String FPP_KEY = "FPP";
+
+ private final int _sizeThresholdInBytes;
+ private final int _expectedInsertions;
+ private final double _fpp;
+
+ /**
Review comment:
Added
##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/IdSetAggregationFunction.java
##########
@@ -0,0 +1,339 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.query.aggregation.function;
+
+import com.google.common.base.Preconditions;
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.pinot.common.function.AggregationFunctionType;
+import org.apache.pinot.common.utils.DataSchema.ColumnDataType;
+import org.apache.pinot.core.common.BlockValSet;
+import org.apache.pinot.core.query.aggregation.AggregationResultHolder;
+import org.apache.pinot.core.query.aggregation.ObjectAggregationResultHolder;
+import org.apache.pinot.core.query.aggregation.groupby.GroupByResultHolder;
+import org.apache.pinot.core.query.aggregation.groupby.ObjectGroupByResultHolder;
+import org.apache.pinot.core.query.request.context.ExpressionContext;
+import org.apache.pinot.core.query.utils.idset.IdSet;
+import org.apache.pinot.core.query.utils.idset.IdSets;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * The {@code IdSetAggregationFunction} collects the values for a given single-value expression into an IdSet, which can
+ * be used in the second query to optimize the query with huge IN clause generated from another query.
+ */
+public class IdSetAggregationFunction extends BaseSingleInputAggregationFunction<IdSet, String> {
+ public static final char PARAMETER_DELIMITER = ';';
+ public static final char PARAMETER_KEY_VALUE_SEPARATOR = '=';
+ public static final String SIZE_THRESHOLD_IN_BYTES_KEY = "SIZETHRESHOLDINBYTES";
Review comment:
Added
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] Jackie-Jiang commented on pull request #5926: Introduce IdSet and add IdSetAggregationFunction
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on pull request #5926:
URL: https://github.com/apache/incubator-pinot/pull/5926#issuecomment-682294392
@kishoreg Refactored the PR and introduced the `BloomFilterIdSet`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] kishoreg commented on a change in pull request #5926: Introduce IdSet and add IdSetAggregationFunction
Posted by GitBox <gi...@apache.org>.
kishoreg commented on a change in pull request #5926:
URL: https://github.com/apache/incubator-pinot/pull/5926#discussion_r478837312
##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/IdSetAggregationFunction.java
##########
@@ -0,0 +1,339 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.query.aggregation.function;
+
+import com.google.common.base.Preconditions;
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.pinot.common.function.AggregationFunctionType;
+import org.apache.pinot.common.utils.DataSchema.ColumnDataType;
+import org.apache.pinot.core.common.BlockValSet;
+import org.apache.pinot.core.query.aggregation.AggregationResultHolder;
+import org.apache.pinot.core.query.aggregation.ObjectAggregationResultHolder;
+import org.apache.pinot.core.query.aggregation.groupby.GroupByResultHolder;
+import org.apache.pinot.core.query.aggregation.groupby.ObjectGroupByResultHolder;
+import org.apache.pinot.core.query.request.context.ExpressionContext;
+import org.apache.pinot.core.query.utils.idset.IdSet;
+import org.apache.pinot.core.query.utils.idset.IdSets;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * The {@code IdSetAggregationFunction} collects the values for a given single-value expression into an IdSet, which can
+ * be used in the second query to optimize the query with huge IN clause generated from another query.
+ */
+public class IdSetAggregationFunction extends BaseSingleInputAggregationFunction<IdSet, String> {
+ public static final char PARAMETER_DELIMITER = ';';
+ public static final char PARAMETER_KEY_VALUE_SEPARATOR = '=';
+ public static final String SIZE_THRESHOLD_IN_BYTES_KEY = "SIZETHRESHOLDINBYTES";
Review comment:
what are these? java docs
##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/IdSetAggregationFunction.java
##########
@@ -0,0 +1,339 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.query.aggregation.function;
+
+import com.google.common.base.Preconditions;
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.pinot.common.function.AggregationFunctionType;
+import org.apache.pinot.common.utils.DataSchema.ColumnDataType;
+import org.apache.pinot.core.common.BlockValSet;
+import org.apache.pinot.core.query.aggregation.AggregationResultHolder;
+import org.apache.pinot.core.query.aggregation.ObjectAggregationResultHolder;
+import org.apache.pinot.core.query.aggregation.groupby.GroupByResultHolder;
+import org.apache.pinot.core.query.aggregation.groupby.ObjectGroupByResultHolder;
+import org.apache.pinot.core.query.request.context.ExpressionContext;
+import org.apache.pinot.core.query.utils.idset.IdSet;
+import org.apache.pinot.core.query.utils.idset.IdSets;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+/**
+ * The {@code IdSetAggregationFunction} collects the values for a given single-value expression into an IdSet, which can
+ * be used in the second query to optimize the query with huge IN clause generated from another query.
+ */
+public class IdSetAggregationFunction extends BaseSingleInputAggregationFunction<IdSet, String> {
+ public static final char PARAMETER_DELIMITER = ';';
+ public static final char PARAMETER_KEY_VALUE_SEPARATOR = '=';
+ public static final String SIZE_THRESHOLD_IN_BYTES_KEY = "SIZETHRESHOLDINBYTES";
+ public static final String EXPECTED_INSERTIONS_KEY = "EXPECTEDINSERTIONS";
+ public static final String FPP_KEY = "FPP";
+
+ private final int _sizeThresholdInBytes;
+ private final int _expectedInsertions;
+ private final double _fpp;
+
+ /**
Review comment:
add a sample call
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] kishoreg commented on a change in pull request #5926: Introduce IdSet and add IdSetAggregationFunction
Posted by GitBox <gi...@apache.org>.
kishoreg commented on a change in pull request #5926:
URL: https://github.com/apache/incubator-pinot/pull/5926#discussion_r477036505
##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/utils/IdSet.java
##########
@@ -0,0 +1,326 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.query.utils;
+
+import com.google.common.base.Preconditions;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Base64;
+import java.util.Objects;
+import javax.annotation.Nullable;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+import org.roaringbitmap.RoaringBitmap;
+import org.roaringbitmap.longlong.Roaring64NavigableMap;
+
+
+/**
+ * The {@code IdSet} represents a collection of ids. It can be used to optimize the query with huge IN clause.
+ */
+public class IdSet implements Comparable<IdSet> {
+ public static final IdSet EMPTY_ID_SET = new IdSet(Type.EMPTY, null, null);
+
+ // Throw exception when the serialized IdSet is exceeding this threshold (32MB)
+ private static final int MAX_SIZE_IN_BYTES = 32 * 1024 * 1024;
+
+ private enum Type {
+ // DO NOT change the index of the types as the ser/de relies on them
+ EMPTY(0), ROARING_BITMAP(1), ROARING_64_NAVIGABLE_MAP(2);
+
+ private final int _index;
+
+ Type(int index) {
+ _index = index;
+ }
+ }
+
+ private final Type _type;
+ private final RoaringBitmap _bitmap;
+ private final Roaring64NavigableMap _longBitmap;
+
Review comment:
Do you think this constructor is the only thing we need? If not, We should make this private and add static builders
##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/utils/IdSet.java
##########
@@ -0,0 +1,326 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.query.utils;
+
+import com.google.common.base.Preconditions;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Base64;
+import java.util.Objects;
+import javax.annotation.Nullable;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+import org.roaringbitmap.RoaringBitmap;
+import org.roaringbitmap.longlong.Roaring64NavigableMap;
+
+
+/**
+ * The {@code IdSet} represents a collection of ids. It can be used to optimize the query with huge IN clause.
+ */
+public class IdSet implements Comparable<IdSet> {
Review comment:
Let’s make this an interface with multiple implementations?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] mayankshriv removed a comment on pull request #5926: Introduce IdSet and add IdSetAggregationFunction
Posted by GitBox <gi...@apache.org>.
mayankshriv removed a comment on pull request #5926:
URL: https://github.com/apache/incubator-pinot/pull/5926#issuecomment-744633367
@Jackie-Jiang Any chance you can break the PR to have the thrift side changes separated from actual code changes? Or do they go hand in hand? Might be easier to review the PR if broken.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] Jackie-Jiang merged pull request #5926: Introduce IdSet and add IdSetAggregationFunction
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang merged pull request #5926:
URL: https://github.com/apache/incubator-pinot/pull/5926
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [incubator-pinot] mayankshriv commented on pull request #5926: Introduce IdSet and add IdSetAggregationFunction
Posted by GitBox <gi...@apache.org>.
mayankshriv commented on pull request #5926:
URL: https://github.com/apache/incubator-pinot/pull/5926#issuecomment-744633367
@Jackie-Jiang Any chance you can break the PR to have the thrift side changes separated from actual code changes? Or do they go hand in hand? Might be easier to review the PR if broken.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org