You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/08/18 05:55:41 UTC

[GitHub] [incubator-pinot] Jackie-Jiang opened a new pull request #5889: Add HAVING support

Jackie-Jiang opened a new pull request #5889:
URL: https://github.com/apache/incubator-pinot/pull/5889


   ## Description
   Add support for SQL HAVING clause
   
   Example of supported query:
   `SELECT MAX(ArrDelay) - MAX(AirTime) AS Diff, DaysSinceEpoch FROM mytable GROUP BY DaysSinceEpoch HAVING Diff > 500 ORDER BY Diff ASC`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5889: Add HAVING support

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5889:
URL: https://github.com/apache/incubator-pinot/pull/5889#discussion_r474196092



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/reduce/HavingFilterHandler.java
##########
@@ -0,0 +1,182 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.query.reduce;
+
+import java.util.List;
+import org.apache.pinot.core.operator.filter.predicate.PredicateEvaluator;
+import org.apache.pinot.core.operator.filter.predicate.PredicateEvaluatorProvider;
+import org.apache.pinot.core.query.request.context.FilterContext;
+import org.apache.pinot.core.query.request.context.predicate.Predicate;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+import org.apache.pinot.spi.utils.ByteArray;
+
+
+/**
+ * Handler for HAVING clause.
+ */
+public class HavingFilterHandler {
+  private final PostAggregationHandler _postAggregationHandler;
+  private final RowMatcher _rowMatcher;
+
+  public HavingFilterHandler(FilterContext havingFilter, PostAggregationHandler postAggregationHandler) {
+    _postAggregationHandler = postAggregationHandler;
+    _rowMatcher = getRowMatcher(havingFilter);
+  }
+
+  /**
+   * Returns {@code true} if the given row matches the HAVING clause, {@code false} otherwise.
+   */
+  public boolean isMatch(Object[] row) {
+    return _rowMatcher.isMatch(row);
+  }
+
+  /**
+   * Helper method to construct a RowMatcher based on the given filter.
+   */
+  private RowMatcher getRowMatcher(FilterContext filter) {
+    switch (filter.getType()) {
+      case AND:
+        return new AndRowMatcher(filter.getChildren());
+      case OR:
+        return new OrRowMatcher(filter.getChildren());
+      case PREDICATE:
+        return new PredicateRowMatcher(filter.getPredicate());
+      default:
+        throw new IllegalStateException();
+    }
+  }
+
+  /**
+   * Filter matcher for the row.
+   */
+  private interface RowMatcher {
+
+    /**
+     * Returns {@code true} if the given row matches the filter, {@code false} otherwise.
+     */
+    boolean isMatch(Object[] row);
+  }
+
+  /**
+   * AND filter matcher.
+   */
+  private class AndRowMatcher implements RowMatcher {
+    RowMatcher[] _childMatchers;
+
+    AndRowMatcher(List<FilterContext> childFilters) {
+      int numChildren = childFilters.size();
+      _childMatchers = new RowMatcher[numChildren];
+      for (int i = 0; i < numChildren; i++) {
+        _childMatchers[i] = getRowMatcher(childFilters.get(i));
+      }
+    }
+
+    @Override
+    public boolean isMatch(Object[] row) {
+      for (RowMatcher childMatcher : _childMatchers) {
+        if (!childMatcher.isMatch(row)) {
+          return false;
+        }
+      }
+      return true;
+    }
+  }
+
+  /**
+   * OR filter matcher.
+   */
+  private class OrRowMatcher implements RowMatcher {
+    RowMatcher[] _childMatchers;
+
+    OrRowMatcher(List<FilterContext> childFilters) {
+      int numChildren = childFilters.size();
+      _childMatchers = new RowMatcher[numChildren];
+      for (int i = 0; i < numChildren; i++) {
+        _childMatchers[i] = getRowMatcher(childFilters.get(i));
+      }
+    }
+
+    @Override
+    public boolean isMatch(Object[] row) {
+      for (RowMatcher childMatcher : _childMatchers) {
+        if (childMatcher.isMatch(row)) {
+          return true;
+        }
+      }
+      return false;
+    }
+  }
+
+  /**
+   * Predicate matcher.
+   */
+  private class PredicateRowMatcher implements RowMatcher {
+    PostAggregationHandler.ValueExtractor _valueExtractor;
+    DataType _valueType;
+    PredicateEvaluator _predicateEvaluator;
+
+    PredicateRowMatcher(Predicate predicate) {
+      _valueExtractor = _postAggregationHandler.getValueExtractor(predicate.getLhs());
+      switch (_valueExtractor.getColumnDataType()) {
+        case INT:
+          _valueType = DataType.INT;
+          break;
+        case LONG:
+          _valueType = DataType.LONG;
+          break;
+        case FLOAT:
+          _valueType = DataType.FLOAT;
+          break;
+        case DOUBLE:
+          _valueType = DataType.DOUBLE;
+          break;
+        case STRING:
+          _valueType = DataType.STRING;
+          break;
+        case BYTES:
+          _valueType = DataType.BYTES;
+          break;
+        default:
+          throw new IllegalStateException();
+      }
+      _predicateEvaluator = PredicateEvaluatorProvider.getPredicateEvaluator(predicate, null, _valueType);
+    }
+
+    @Override
+    public boolean isMatch(Object[] row) {
+      Object value = _valueExtractor.extract(row);
+      switch (_valueType) {
+        case INT:
+          return _predicateEvaluator.applySV((int) value);
+        case LONG:
+          return _predicateEvaluator.applySV((long) value);
+        case FLOAT:
+          return _predicateEvaluator.applySV((float) value);
+        case DOUBLE:
+          return _predicateEvaluator.applySV((double) value);
+        case STRING:
+          return _predicateEvaluator.applySV((String) value);
+        case BYTES:
+          return _predicateEvaluator.applySV(((ByteArray) value).getBytes());
+        default:
+          throw new IllegalStateException();

Review comment:
       It is not possible to run into this branch as the `_valueType` is set inside the class




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] npawar commented on a change in pull request #5889: Add HAVING support

Posted by GitBox <gi...@apache.org>.
npawar commented on a change in pull request #5889:
URL: https://github.com/apache/incubator-pinot/pull/5889#discussion_r474133820



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/reduce/HavingFilterHandler.java
##########
@@ -0,0 +1,182 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.query.reduce;
+
+import java.util.List;
+import org.apache.pinot.core.operator.filter.predicate.PredicateEvaluator;
+import org.apache.pinot.core.operator.filter.predicate.PredicateEvaluatorProvider;
+import org.apache.pinot.core.query.request.context.FilterContext;
+import org.apache.pinot.core.query.request.context.predicate.Predicate;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+import org.apache.pinot.spi.utils.ByteArray;
+
+
+/**
+ * Handler for HAVING clause.
+ */
+public class HavingFilterHandler {
+  private final PostAggregationHandler _postAggregationHandler;
+  private final RowMatcher _rowMatcher;
+
+  public HavingFilterHandler(FilterContext havingFilter, PostAggregationHandler postAggregationHandler) {
+    _postAggregationHandler = postAggregationHandler;
+    _rowMatcher = getRowMatcher(havingFilter);
+  }
+
+  /**
+   * Returns {@code true} if the given row matches the HAVING clause, {@code false} otherwise.
+   */
+  public boolean isMatch(Object[] row) {
+    return _rowMatcher.isMatch(row);
+  }
+
+  /**
+   * Helper method to construct a RowMatcher based on the given filter.
+   */
+  private RowMatcher getRowMatcher(FilterContext filter) {
+    switch (filter.getType()) {
+      case AND:
+        return new AndRowMatcher(filter.getChildren());
+      case OR:
+        return new OrRowMatcher(filter.getChildren());
+      case PREDICATE:
+        return new PredicateRowMatcher(filter.getPredicate());
+      default:
+        throw new IllegalStateException();
+    }
+  }
+
+  /**
+   * Filter matcher for the row.
+   */
+  private interface RowMatcher {
+
+    /**
+     * Returns {@code true} if the given row matches the filter, {@code false} otherwise.
+     */
+    boolean isMatch(Object[] row);
+  }
+
+  /**
+   * AND filter matcher.
+   */
+  private class AndRowMatcher implements RowMatcher {
+    RowMatcher[] _childMatchers;
+
+    AndRowMatcher(List<FilterContext> childFilters) {
+      int numChildren = childFilters.size();
+      _childMatchers = new RowMatcher[numChildren];
+      for (int i = 0; i < numChildren; i++) {
+        _childMatchers[i] = getRowMatcher(childFilters.get(i));
+      }
+    }
+
+    @Override
+    public boolean isMatch(Object[] row) {
+      for (RowMatcher childMatcher : _childMatchers) {
+        if (!childMatcher.isMatch(row)) {
+          return false;
+        }
+      }
+      return true;
+    }
+  }
+
+  /**
+   * OR filter matcher.
+   */
+  private class OrRowMatcher implements RowMatcher {
+    RowMatcher[] _childMatchers;
+
+    OrRowMatcher(List<FilterContext> childFilters) {
+      int numChildren = childFilters.size();
+      _childMatchers = new RowMatcher[numChildren];
+      for (int i = 0; i < numChildren; i++) {
+        _childMatchers[i] = getRowMatcher(childFilters.get(i));
+      }
+    }
+
+    @Override
+    public boolean isMatch(Object[] row) {
+      for (RowMatcher childMatcher : _childMatchers) {
+        if (childMatcher.isMatch(row)) {
+          return true;
+        }
+      }
+      return false;
+    }
+  }
+
+  /**
+   * Predicate matcher.
+   */
+  private class PredicateRowMatcher implements RowMatcher {
+    PostAggregationHandler.ValueExtractor _valueExtractor;
+    DataType _valueType;
+    PredicateEvaluator _predicateEvaluator;
+
+    PredicateRowMatcher(Predicate predicate) {
+      _valueExtractor = _postAggregationHandler.getValueExtractor(predicate.getLhs());
+      switch (_valueExtractor.getColumnDataType()) {
+        case INT:
+          _valueType = DataType.INT;
+          break;
+        case LONG:
+          _valueType = DataType.LONG;
+          break;
+        case FLOAT:
+          _valueType = DataType.FLOAT;
+          break;
+        case DOUBLE:
+          _valueType = DataType.DOUBLE;
+          break;
+        case STRING:
+          _valueType = DataType.STRING;
+          break;
+        case BYTES:
+          _valueType = DataType.BYTES;
+          break;
+        default:
+          throw new IllegalStateException();
+      }
+      _predicateEvaluator = PredicateEvaluatorProvider.getPredicateEvaluator(predicate, null, _valueType);
+    }
+
+    @Override
+    public boolean isMatch(Object[] row) {
+      Object value = _valueExtractor.extract(row);
+      switch (_valueType) {
+        case INT:
+          return _predicateEvaluator.applySV((int) value);
+        case LONG:
+          return _predicateEvaluator.applySV((long) value);
+        case FLOAT:
+          return _predicateEvaluator.applySV((float) value);
+        case DOUBLE:
+          return _predicateEvaluator.applySV((double) value);
+        case STRING:
+          return _predicateEvaluator.applySV((String) value);
+        case BYTES:
+          return _predicateEvaluator.applySV(((ByteArray) value).getBytes());
+        default:
+          throw new IllegalStateException();

Review comment:
       some message?

##########
File path: pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/BaseClusterIntegrationTestSet.java
##########
@@ -241,6 +241,13 @@ public void testHardcodedSqlQueries()
     query =
         "SELECT DaysSinceEpoch, MAX(ArrDelay) * 2 - MAX(AirTime) - 3 FROM mytable GROUP BY DaysSinceEpoch ORDER BY MAX(ArrDelay) - MIN(AirTime) DESC";
     testSqlQuery(query, Collections.singletonList(query));
+
+    // Having
+    query = "SELECT COUNT(*) AS Count, DaysSinceEpoch FROM mytable GROUP BY DaysSinceEpoch HAVING Count > 350";
+    testSqlQuery(query, Collections.singletonList(query));
+    query =
+        "SELECT MAX(ArrDelay) - MAX(AirTime) AS Diff, DaysSinceEpoch FROM mytable GROUP BY DaysSinceEpoch HAVING Diff * 2 > 1000 ORDER BY Diff ASC";

Review comment:
       how about some queries with AND and OR RowMatchers as well?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang merged pull request #5889: Add HAVING support

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang merged pull request #5889:
URL: https://github.com/apache/incubator-pinot/pull/5889


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5889: Add HAVING support

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5889:
URL: https://github.com/apache/incubator-pinot/pull/5889#discussion_r472512962



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/reduce/GroupByDataTableReducer.java
##########
@@ -208,50 +232,42 @@ private DataSchema getPrePostAggregationDataSchema(DataSchema dataSchema) {
   }
 
   private IndexedTable getIndexedTable(DataSchema dataSchema, Collection<DataTable> dataTables) {
-    int indexedTableCapacity = GroupByUtils.getTableCapacity(_queryContext);
-    IndexedTable indexedTable = new ConcurrentIndexedTable(dataSchema, _queryContext, indexedTableCapacity);
-
+    int capacity = GroupByUtils.getTableCapacity(_queryContext);
+    IndexedTable indexedTable = new SimpleIndexedTable(dataSchema, _queryContext, capacity);
+    ColumnDataType[] columnDataTypes = dataSchema.getColumnDataTypes();
     for (DataTable dataTable : dataTables) {
-      BiFunction[] functions = new BiFunction[_numColumns];
-      for (int i = 0; i < _numColumns; i++) {
-        ColumnDataType columnDataType = dataSchema.getColumnDataType(i);
-        BiFunction<Integer, Integer, Object> function;
-        switch (columnDataType) {
-          case INT:
-            function = dataTable::getInt;
-            break;
-          case LONG:
-            function = dataTable::getLong;
-            break;
-          case FLOAT:
-            function = dataTable::getFloat;
-            break;
-          case DOUBLE:
-            function = dataTable::getDouble;
-            break;
-          case STRING:
-            function = dataTable::getString;
-            break;
-          case BYTES:
-            function = dataTable::getBytes;
-            break;
-          case OBJECT:
-            function = dataTable::getObject;
-            break;
-          // Add other aggregation intermediate result / group-by column type supports here
-          default:
-            throw new IllegalStateException();
-        }
-        functions[i] = function;
-      }
-
-      for (int row = 0; row < dataTable.getNumberOfRows(); row++) {
-        Object[] columns = new Object[_numColumns];
-        for (int col = 0; col < _numColumns; col++) {
-          columns[col] = functions[col].apply(row, col);
+      int numRows = dataTable.getNumberOfRows();
+      for (int rowId = 0; rowId < numRows; rowId++) {
+        Object[] values = new Object[_numColumns];
+        for (int colId = 0; colId < _numColumns; colId++) {
+          switch (columnDataTypes[colId]) {

Review comment:
       NOTE: Change this to per-value switch based because we have found that this way has better performance (similar change in #4788 for performance improvement)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5889: Add HAVING support

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5889:
URL: https://github.com/apache/incubator-pinot/pull/5889#discussion_r474207344



##########
File path: pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/BaseClusterIntegrationTestSet.java
##########
@@ -241,6 +241,13 @@ public void testHardcodedSqlQueries()
     query =
         "SELECT DaysSinceEpoch, MAX(ArrDelay) * 2 - MAX(AirTime) - 3 FROM mytable GROUP BY DaysSinceEpoch ORDER BY MAX(ArrDelay) - MIN(AirTime) DESC";
     testSqlQuery(query, Collections.singletonList(query));
+
+    // Having
+    query = "SELECT COUNT(*) AS Count, DaysSinceEpoch FROM mytable GROUP BY DaysSinceEpoch HAVING Count > 350";
+    testSqlQuery(query, Collections.singletonList(query));
+    query =
+        "SELECT MAX(ArrDelay) - MAX(AirTime) AS Diff, DaysSinceEpoch FROM mytable GROUP BY DaysSinceEpoch HAVING Diff * 2 > 1000 ORDER BY Diff ASC";

Review comment:
       Added




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org