You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/07/29 07:52:46 UTC

[GitHub] [incubator-pinot] Jackie-Jiang opened a new pull request #5765: Optimize DistinctCount to store dictIds within segment

Jackie-Jiang opened a new pull request #5765:
URL: https://github.com/apache/incubator-pinot/pull/5765


   ## Description
   For DistinctCount aggregation function, we can store dictIds within segment without fetching the values, and read the values from dictionary before returning the result for the segment.
   
   Performance comparison:
   With the data in `OfflineClusterIntegrationTest`, for query: `select distinctcountmv(DivAirports) from mytable`, latency dropped from 18ms to 6ms.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5765: Optimize DistinctCount to store dictIds within segment

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5765:
URL: https://github.com/apache/incubator-pinot/pull/5765#discussion_r462055513



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/ProjectionBlockValSet.java
##########
@@ -32,32 +35,33 @@
 public class ProjectionBlockValSet implements BlockValSet {
   private final DataBlockCache _dataBlockCache;
   private final String _column;
-  private final DataType _dataType;
-  private final boolean _singleValue;
+  private final DataSource _dataSource;
 
   /**
    * Constructor for the class.
-   * The dataBlockCache argument is initialized in {@link ProjectionOperator},
-   * so that it can be reused across multiple calls to {@link ProjectionOperator#nextBlock()}.
-   *
-   * @param dataBlockCache data block cache
-   * @param column Projection column.
+   * The dataBlockCache is initialized in {@link ProjectionOperator} so that it can be reused across multiple calls to
+   * {@link ProjectionOperator#nextBlock()}.
    */
-  public ProjectionBlockValSet(DataBlockCache dataBlockCache, String column, DataType dataType, boolean singleValue) {
+  public ProjectionBlockValSet(DataBlockCache dataBlockCache, String column, DataSource dataSource) {
     _dataBlockCache = dataBlockCache;
     _column = column;
-    _dataType = dataType;
-    _singleValue = singleValue;
+    _dataSource = dataSource;
   }
 
   @Override
   public DataType getValueType() {
-    return _dataType;
+    return _dataSource.getDataSourceMetadata().getDataType();
   }
 
   @Override
   public boolean isSingleValue() {
-    return _singleValue;
+    return _dataSource.getDataSourceMetadata().isSingleValue();
+  }
+
+  @Nullable
+  @Override
+  public Dictionary getDictionary() {

Review comment:
       For group-by, the dictionary is fetched from the `TransformFunction`

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountAggregationFunction.java
##########
@@ -212,11 +259,57 @@ public IntOpenHashSet extractGroupByResult(GroupByResultHolder groupByResultHold
     IntOpenHashSet valueSet = groupByResultHolder.getResult(groupKey);
     if (valueSet == null) {
       return new IntOpenHashSet();
+    }
+
+    if (_dictionary != null) {
+      // For dictionary-encoded expression, convert dictionary ids to values
+      return convertToValueSet(valueSet);
     } else {
       return valueSet;
     }
   }
 
+  private IntOpenHashSet convertToValueSet(IntOpenHashSet dictIdSet) {

Review comment:
       Added




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on a change in pull request #5765: Optimize DistinctCount to store dictIds within segment

Posted by GitBox <gi...@apache.org>.
kishoreg commented on a change in pull request #5765:
URL: https://github.com/apache/incubator-pinot/pull/5765#discussion_r462002553



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/ProjectionBlock.java
##########
@@ -25,22 +25,22 @@
 import org.apache.pinot.core.common.BlockMetadata;
 import org.apache.pinot.core.common.BlockValSet;
 import org.apache.pinot.core.common.DataBlockCache;
-import org.apache.pinot.core.common.DataSourceMetadata;
+import org.apache.pinot.core.common.DataSource;
 import org.apache.pinot.core.operator.docvalsets.ProjectionBlockValSet;
-import org.apache.pinot.spi.data.FieldSpec;
 
 
 /**
  * ProjectionBlock holds a column name to Block Map.
  * It provides DocIdSetBlock for a given column.
  */
 public class ProjectionBlock implements Block {
-  private final Map<String, DataSourceMetadata> _dataSourceMetadataMap;
+  private final Map<String, DataSource> _dataSourceMap;

Review comment:
       probably not relevant to this PR but we should try to move towards using Block instead of DataSource

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/ProjectionBlockValSet.java
##########
@@ -32,32 +35,33 @@
 public class ProjectionBlockValSet implements BlockValSet {
   private final DataBlockCache _dataBlockCache;
   private final String _column;
-  private final DataType _dataType;
-  private final boolean _singleValue;
+  private final DataSource _dataSource;
 
   /**
    * Constructor for the class.
-   * The dataBlockCache argument is initialized in {@link ProjectionOperator},
-   * so that it can be reused across multiple calls to {@link ProjectionOperator#nextBlock()}.
-   *
-   * @param dataBlockCache data block cache
-   * @param column Projection column.
+   * The dataBlockCache is initialized in {@link ProjectionOperator} so that it can be reused across multiple calls to
+   * {@link ProjectionOperator#nextBlock()}.
    */
-  public ProjectionBlockValSet(DataBlockCache dataBlockCache, String column, DataType dataType, boolean singleValue) {
+  public ProjectionBlockValSet(DataBlockCache dataBlockCache, String column, DataSource dataSource) {
     _dataBlockCache = dataBlockCache;
     _column = column;
-    _dataType = dataType;
-    _singleValue = singleValue;
+    _dataSource = dataSource;
   }
 
   @Override
   public DataType getValueType() {
-    return _dataType;
+    return _dataSource.getDataSourceMetadata().getDataType();
   }
 
   @Override
   public boolean isSingleValue() {
-    return _singleValue;
+    return _dataSource.getDataSourceMetadata().isSingleValue();
+  }
+
+  @Nullable
+  @Override
+  public Dictionary getDictionary() {

Review comment:
       how come we never needed this? For group by, we already have the optimization to work on dictionary ids right?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountAggregationFunction.java
##########
@@ -212,11 +259,57 @@ public IntOpenHashSet extractGroupByResult(GroupByResultHolder groupByResultHold
     IntOpenHashSet valueSet = groupByResultHolder.getResult(groupKey);
     if (valueSet == null) {
       return new IntOpenHashSet();
+    }
+
+    if (_dictionary != null) {
+      // For dictionary-encoded expression, convert dictionary ids to values
+      return convertToValueSet(valueSet);
     } else {
       return valueSet;
     }
   }
 
+  private IntOpenHashSet convertToValueSet(IntOpenHashSet dictIdSet) {

Review comment:
       java doc for this?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org