You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/09/30 21:47:03 UTC

[GitHub] [incubator-pinot] fx19880617 opened a new pull request #6084: Adding array transform functions: array_average, array_max, array_min, array_sum

fx19880617 opened a new pull request #6084:
URL: https://github.com/apache/incubator-pinot/pull/6084


   ## Description
   Adding array transform functions to operate on multi-value column.
   Functions added:
   array_average
   array_max
   array_min
   array_sum
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6084: Adding array transform functions: array_average, array_max, array_min, array_sum

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #6084:
URL: https://github.com/apache/incubator-pinot/pull/6084#discussion_r497837296



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/function/TransformFunctionType.java
##########
@@ -53,6 +53,10 @@
   DATETIMECONVERT("dateTimeConvert"),
   DATETRUNC("dateTrunc"),
   ARRAYLENGTH("arrayLength"),
+  ARRAY_AVERAGE("array_average"),

Review comment:
       Remove the underscore so that it works for both `arrayAverage` and `array_average`?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/ArraySumTransformFunction.java
##########
@@ -0,0 +1,148 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.operator.transform.function;
+
+import java.util.List;
+import java.util.Map;
+import org.apache.pinot.core.common.DataSource;
+import org.apache.pinot.core.operator.blocks.ProjectionBlock;
+import org.apache.pinot.core.operator.transform.TransformResultMetadata;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.util.ArrayCopyUtils;
+import org.apache.pinot.spi.data.FieldSpec;
+
+
+/**
+ * The ArraySumTransformFunction class implements array_sum function for multi-valued columns
+ *
+ * Sample queries:
+ * SELECT COUNT(*) FROM table WHERE array_sum(mvColumn) > 2
+ * SELECT COUNT(*) FROM table GROUP BY array_sum(mvColumn)
+ * SELECT SUM(array_sum(mvColumn)) FROM table
+ */
+public class ArraySumTransformFunction extends BaseTransformFunction {
+  public static final String FUNCTION_NAME = "array_sum";
+
+  private long[] _longResults;
+  private double[] _doubleResults;
+  private TransformFunction _argument;
+  private TransformResultMetadata _resultMetadata;
+
+  @Override
+  public String getName() {
+    return FUNCTION_NAME;
+  }
+
+  @Override
+  public void init(List<TransformFunction> arguments, Map<String, DataSource> dataSourceMap) {
+    // Check that there is only 1 argument
+    if (arguments.size() != 1) {
+      throw new IllegalArgumentException("Exactly 1 argument is required for ARRAY_AVERAGE transform function");
+    }
+
+    // Check that the argument is a multi-valued column or transform function
+    TransformFunction firstArgument = arguments.get(0);
+    if (firstArgument instanceof LiteralTransformFunction || firstArgument.getResultMetadata().isSingleValue()) {
+      throw new IllegalArgumentException(
+          "The argument of ARRAY_AVERAGE transform function must be a multi-valued column or a transform function");
+    }
+    FieldSpec.DataType resultDataType;
+    switch (firstArgument.getResultMetadata().getDataType()) {

Review comment:
       I think we can always return doubles for this function to keep the consistent behavior between aggregation and transform.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/ArraySumTransformFunction.java
##########
@@ -0,0 +1,148 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.operator.transform.function;
+
+import java.util.List;
+import java.util.Map;
+import org.apache.pinot.core.common.DataSource;
+import org.apache.pinot.core.operator.blocks.ProjectionBlock;
+import org.apache.pinot.core.operator.transform.TransformResultMetadata;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.util.ArrayCopyUtils;
+import org.apache.pinot.spi.data.FieldSpec;
+
+
+/**
+ * The ArraySumTransformFunction class implements array_sum function for multi-valued columns
+ *
+ * Sample queries:
+ * SELECT COUNT(*) FROM table WHERE array_sum(mvColumn) > 2
+ * SELECT COUNT(*) FROM table GROUP BY array_sum(mvColumn)
+ * SELECT SUM(array_sum(mvColumn)) FROM table
+ */
+public class ArraySumTransformFunction extends BaseTransformFunction {
+  public static final String FUNCTION_NAME = "array_sum";
+
+  private long[] _longResults;
+  private double[] _doubleResults;
+  private TransformFunction _argument;
+  private TransformResultMetadata _resultMetadata;
+
+  @Override
+  public String getName() {
+    return FUNCTION_NAME;
+  }
+
+  @Override
+  public void init(List<TransformFunction> arguments, Map<String, DataSource> dataSourceMap) {
+    // Check that there is only 1 argument
+    if (arguments.size() != 1) {
+      throw new IllegalArgumentException("Exactly 1 argument is required for ARRAY_AVERAGE transform function");
+    }
+
+    // Check that the argument is a multi-valued column or transform function
+    TransformFunction firstArgument = arguments.get(0);
+    if (firstArgument instanceof LiteralTransformFunction || firstArgument.getResultMetadata().isSingleValue()) {
+      throw new IllegalArgumentException(
+          "The argument of ARRAY_AVERAGE transform function must be a multi-valued column or a transform function");
+    }
+    FieldSpec.DataType resultDataType;
+    switch (firstArgument.getResultMetadata().getDataType()) {
+      case INT:
+      case LONG:
+        resultDataType = FieldSpec.DataType.LONG;
+        break;
+      case FLOAT:
+      case DOUBLE:
+        resultDataType = FieldSpec.DataType.DOUBLE;
+        break;
+      default:
+        throw new IllegalArgumentException(
+            "The argument of ARRAY_AVERAGE transform function must be numeric");
+    }
+    _resultMetadata = new TransformResultMetadata(resultDataType, true, false);
+    _argument = firstArgument;
+  }
+
+  @Override
+  public TransformResultMetadata getResultMetadata() {
+    return _resultMetadata;
+  }
+
+  @Override
+  public long[] transformToLongValuesSV(ProjectionBlock projectionBlock) {
+    if (_longResults == null) {
+      _longResults = new long[DocIdSetPlanNode.MAX_DOC_PER_CALL];
+    }
+    int length = projectionBlock.getNumDocs();
+    long sumRes;
+    switch (_argument.getResultMetadata().getDataType()) {
+      case INT:

Review comment:
       For `INT`, use `_argument.transformToIntValuesMV(projectionBlock)` for better performance (avoid extra casting). Same for `FLOAT`

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/ArrayAverageTransformFunction.java
##########
@@ -0,0 +1,127 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.operator.transform.function;
+
+import java.util.List;
+import java.util.Map;
+import org.apache.pinot.core.common.DataSource;
+import org.apache.pinot.core.operator.blocks.ProjectionBlock;
+import org.apache.pinot.core.operator.transform.TransformResultMetadata;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+
+
+/**
+ * The ArrayAverageTransformFunction class implements array_average function for multi-valued columns
+ *
+ * Sample queries:
+ * SELECT COUNT(*) FROM table WHERE array_average(mvColumn) > 2
+ * SELECT COUNT(*) FROM table GROUP BY array_average(mvColumn)
+ * SELECT SUM(array_average(mvColumn)) FROM table
+ */
+public class ArrayAverageTransformFunction extends BaseTransformFunction {
+  public static final String FUNCTION_NAME = "array_average";
+
+  private double[] _results;
+  private TransformFunction _argument;
+
+  @Override
+  public String getName() {
+    return FUNCTION_NAME;
+  }
+
+  @Override
+  public void init(List<TransformFunction> arguments, Map<String, DataSource> dataSourceMap) {
+    // Check that there is only 1 argument
+    if (arguments.size() != 1) {
+      throw new IllegalArgumentException("Exactly 1 argument is required for ARRAY_AVERAGE transform function");
+    }
+
+    // Check that the argument is a multi-valued column or transform function
+    TransformFunction firstArgument = arguments.get(0);
+    if (firstArgument instanceof LiteralTransformFunction || firstArgument.getResultMetadata().isSingleValue()) {
+      throw new IllegalArgumentException(
+          "The argument of ARRAY_AVERAGE transform function must be a multi-valued column or a transform function");
+    }
+    if (!firstArgument.getResultMetadata().getDataType().isNumeric()) {
+      throw new IllegalArgumentException(
+          "The argument of ARRAY_AVERAGE transform function must be numeric");
+    }
+    _argument = firstArgument;
+  }
+
+  @Override
+  public TransformResultMetadata getResultMetadata() {
+    return DOUBLE_SV_NO_DICTIONARY_METADATA;
+  }
+
+  @Override
+  public double[] transformToDoubleValuesSV(ProjectionBlock projectionBlock) {
+    if (_results == null) {
+      _results = new double[DocIdSetPlanNode.MAX_DOC_PER_CALL];
+    }
+
+    int numDocs = projectionBlock.getNumDocs();
+    double sumRes;
+    switch (_argument.getResultMetadata().getDataType()) {
+      case INT:
+        int[][] intValuesMV = _argument.transformToIntValuesMV(projectionBlock);
+        for (int i = 0; i < numDocs; i++) {
+          sumRes = 0;
+          for (int j = 0; j < intValuesMV[i].length; j++) {

Review comment:
       (nit) Cache `intValuesMV[i].length`, same for other places

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/ArrayAverageTransformFunction.java
##########
@@ -0,0 +1,127 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.operator.transform.function;
+
+import java.util.List;
+import java.util.Map;
+import org.apache.pinot.core.common.DataSource;
+import org.apache.pinot.core.operator.blocks.ProjectionBlock;
+import org.apache.pinot.core.operator.transform.TransformResultMetadata;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+
+
+/**
+ * The ArrayAverageTransformFunction class implements array_average function for multi-valued columns
+ *
+ * Sample queries:
+ * SELECT COUNT(*) FROM table WHERE array_average(mvColumn) > 2
+ * SELECT COUNT(*) FROM table GROUP BY array_average(mvColumn)
+ * SELECT SUM(array_average(mvColumn)) FROM table
+ */
+public class ArrayAverageTransformFunction extends BaseTransformFunction {
+  public static final String FUNCTION_NAME = "array_average";
+
+  private double[] _results;
+  private TransformFunction _argument;
+
+  @Override
+  public String getName() {
+    return FUNCTION_NAME;
+  }
+
+  @Override
+  public void init(List<TransformFunction> arguments, Map<String, DataSource> dataSourceMap) {
+    // Check that there is only 1 argument
+    if (arguments.size() != 1) {
+      throw new IllegalArgumentException("Exactly 1 argument is required for ARRAY_AVERAGE transform function");
+    }
+
+    // Check that the argument is a multi-valued column or transform function
+    TransformFunction firstArgument = arguments.get(0);
+    if (firstArgument instanceof LiteralTransformFunction || firstArgument.getResultMetadata().isSingleValue()) {
+      throw new IllegalArgumentException(
+          "The argument of ARRAY_AVERAGE transform function must be a multi-valued column or a transform function");
+    }
+    if (!firstArgument.getResultMetadata().getDataType().isNumeric()) {
+      throw new IllegalArgumentException(
+          "The argument of ARRAY_AVERAGE transform function must be numeric");
+    }
+    _argument = firstArgument;
+  }
+
+  @Override
+  public TransformResultMetadata getResultMetadata() {
+    return DOUBLE_SV_NO_DICTIONARY_METADATA;
+  }
+
+  @Override
+  public double[] transformToDoubleValuesSV(ProjectionBlock projectionBlock) {
+    if (_results == null) {
+      _results = new double[DocIdSetPlanNode.MAX_DOC_PER_CALL];
+    }
+
+    int numDocs = projectionBlock.getNumDocs();
+    double sumRes;

Review comment:
       (nit) Move the declaration into the for look for better readability

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/ArrayMaxTransformFunction.java
##########
@@ -0,0 +1,179 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.operator.transform.function;
+
+import java.util.List;
+import java.util.Map;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.pinot.core.common.DataSource;
+import org.apache.pinot.core.operator.blocks.ProjectionBlock;
+import org.apache.pinot.core.operator.transform.TransformResultMetadata;
+import org.apache.pinot.core.plan.DocIdSetPlanNode;
+import org.apache.pinot.core.util.ArrayCopyUtils;
+import org.apache.pinot.spi.data.FieldSpec;
+
+
+/**
+ * The ArrayMaxTransformFunction class implements array_max function for multi-valued columns
+ *
+ * Sample queries:
+ * SELECT COUNT(*) FROM table WHERE array_max(mvColumn) > 2
+ * SELECT COUNT(*) FROM table GROUP BY array_max(mvColumn)
+ * SELECT SUM(array_max(mvColumn)) FROM table
+ */
+public class ArrayMaxTransformFunction extends BaseTransformFunction {
+  public static final String FUNCTION_NAME = "array_max";
+
+  private int[] _intValuesSV;
+  private long[] _longValuesSV;
+  private float[] _floatValuesSV;
+  private double[] _doubleValuesSV;
+  private String[] _stringValuesSV;
+  private TransformFunction _argument;
+  private TransformResultMetadata _resultMetadata;
+
+  @Override
+  public String getName() {
+    return FUNCTION_NAME;
+  }
+
+  @Override
+  public void init(List<TransformFunction> arguments, Map<String, DataSource> dataSourceMap) {
+    // Check that there is only 1 argument
+    if (arguments.size() != 1) {
+      throw new IllegalArgumentException("Exactly 1 argument is required for ARRAY_MAX transform function");
+    }
+
+    // Check that the argument is a multi-valued column or transform function
+    TransformFunction firstArgument = arguments.get(0);
+    if (firstArgument instanceof LiteralTransformFunction || firstArgument.getResultMetadata().isSingleValue()) {
+      throw new IllegalArgumentException(
+          "The argument of ARRAY_MAX transform function must be a multi-valued column or a transform function");
+    }
+    _resultMetadata = new TransformResultMetadata(firstArgument.getResultMetadata().getDataType(), true, false);
+    _argument = firstArgument;
+  }
+
+  @Override
+  public TransformResultMetadata getResultMetadata() {
+    return _resultMetadata;
+  }
+
+  @Override
+  public int[] transformToIntValuesSV(ProjectionBlock projectionBlock) {
+    if (_argument.getResultMetadata().getDataType() != FieldSpec.DataType.INT) {
+      return super.transformToIntValuesSV(projectionBlock);
+    }
+    if (_intValuesSV == null) {
+      _intValuesSV = new int[DocIdSetPlanNode.MAX_DOC_PER_CALL];
+    }
+    int length = projectionBlock.getNumDocs();
+    int[][] intValuesMV = _argument.transformToIntValuesMV(projectionBlock);
+    for (int i = 0; i < length; i++) {
+      int maxRes = Integer.MIN_VALUE;
+      for (int j = 0; j < intValuesMV[i].length; j++) {
+        maxRes = Math.max(maxRes, intValuesMV[i][j]);
+      }

Review comment:
       (nit) Same for other places
   ```suggestion
         for (int value : intValuesMV[i]) {
           maxRes = Math.max(maxRes, value);
         }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 merged pull request #6084: Adding array transform functions: array_average, array_max, array_min, array_sum

Posted by GitBox <gi...@apache.org>.
fx19880617 merged pull request #6084:
URL: https://github.com/apache/incubator-pinot/pull/6084


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org