You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/09/22 03:47:48 UTC
[GitHub] [druid] clintropolis commented on a change in pull request #10401: vectorized expressions and expression virtual columns

clintropolis commented on a change in pull request #10401:
URL: https://github.com/apache/druid/pull/10401#discussion_r492457149



##########
File path: core/src/main/java/org/apache/druid/math/expr/Expr.java
##########
@@ -148,6 +170,60 @@ default ExprType getOutputType(InputBindingTypes inputTypes)
   {
     @Nullable
     ExprType getType(String name);
+
+    default boolean areNumeric(List<Expr> args)
+    {
+      boolean numeric = args.size() > 0;
+      for (Expr arg : args) {
+        ExprType argType = arg.getOutputType(this);
+        if (argType == null) {
+          numeric = false;
+          break;
+        }
+        numeric &= argType.isNumeric();
+      }
+      return numeric;
+    }
+
+    default boolean areNumeric(Expr... args)
+    {
+      boolean numeric = args.length > 0;
+      for (Expr arg : args) {
+        ExprType argType = arg.getOutputType(this);
+        if (argType == null) {
+          numeric = false;
+          break;
+        }
+        numeric &= argType.isNumeric();
+      }
+      return numeric;

Review comment:
       done

##########
File path: core/src/main/java/org/apache/druid/math/expr/Expr.java
##########
@@ -148,6 +170,60 @@ default ExprType getOutputType(InputBindingTypes inputTypes)
   {
     @Nullable
     ExprType getType(String name);
+
+    default boolean areNumeric(List<Expr> args)
+    {
+      boolean numeric = args.size() > 0;
+      for (Expr arg : args) {
+        ExprType argType = arg.getOutputType(this);
+        if (argType == null) {
+          numeric = false;
+          break;
+        }
+        numeric &= argType.isNumeric();
+      }
+      return numeric;
+    }
+
+    default boolean areNumeric(Expr... args)
+    {
+      boolean numeric = args.length > 0;
+      for (Expr arg : args) {
+        ExprType argType = arg.getOutputType(this);
+        if (argType == null) {
+          numeric = false;
+          break;
+        }
+        numeric &= argType.isNumeric();
+      }
+      return numeric;
+    }
+
+    default boolean canVectorize(List<Expr> args)
+    {
+      boolean canVectorize = true;
+      for (Expr arg : args) {
+        canVectorize &= arg.canVectorize(this);
+      }
+      return canVectorize;
+    }
+
+    default boolean canVectorize(Expr... args)
+    {
+      boolean canVectorize = true;

Review comment:
       done

##########
File path: core/src/main/java/org/apache/druid/math/expr/Expr.java
##########
@@ -148,6 +170,60 @@ default ExprType getOutputType(InputBindingTypes inputTypes)
   {
     @Nullable
     ExprType getType(String name);
+
+    default boolean areNumeric(List<Expr> args)
+    {
+      boolean numeric = args.size() > 0;
+      for (Expr arg : args) {
+        ExprType argType = arg.getOutputType(this);
+        if (argType == null) {
+          numeric = false;
+          break;
+        }
+        numeric &= argType.isNumeric();
+      }
+      return numeric;
+    }
+
+    default boolean areNumeric(Expr... args)
+    {
+      boolean numeric = args.length > 0;
+      for (Expr arg : args) {
+        ExprType argType = arg.getOutputType(this);
+        if (argType == null) {
+          numeric = false;
+          break;
+        }
+        numeric &= argType.isNumeric();
+      }
+      return numeric;
+    }
+
+    default boolean canVectorize(List<Expr> args)
+    {
+      boolean canVectorize = true;
+      for (Expr arg : args) {
+        canVectorize &= arg.canVectorize(this);
+      }
+      return canVectorize;
+    }
+
+    default boolean canVectorize(Expr... args)
+    {
+      boolean canVectorize = true;
+      for (Expr arg : args) {
+        canVectorize &= arg.canVectorize(this);
+      }
+      return canVectorize;
+    }
+  }
+
+  /**
+   * {@link InputBindingTypes} + vectorizations stuff for {@link #buildVectorized}
+   */
+  interface VectorInputBindingTypes extends InputBindingTypes

Review comment:
       I think `VectorInputBindingTypes` and `VectorInputBinding` could be consolidated. They exist in split form from an earlier prototype before I had expression output type inference using `InputBindingTypes` existed. 
   
   Conceptually it sort of makes sense to me to have them split because you don't really need a full binding backed by selectors in order to build a vectorized expression processor, just the input types and the max vector size. But, there isn't currently a need for this, so I will consider/look into consolidating these interfaces.
   
   It probably does make sense to consider moving `VectorSizeInspector` as well, but I would rather not move into `core` from `processing` in this PR.

##########
File path: core/src/main/java/org/apache/druid/math/expr/ExprEval.java
##########
@@ -121,6 +121,23 @@ public static ExprEval bestEffortOf(@Nullable Object val)
     return new StringExprEval(val == null ? null : String.valueOf(val));
   }
 
+  @Nullable
+  public static Number computeNumber(@Nullable String value)
+  {
+    if (value == null) {
+      return null;
+    }
+    Number rv;
+    Long v = GuavaUtils.tryParseLong(value);

Review comment:
       I agree that we should look into consolidating these methods, not in this PR though.

##########
File path: core/src/main/java/org/apache/druid/math/expr/ExprEval.java
##########
@@ -121,6 +121,23 @@ public static ExprEval bestEffortOf(@Nullable Object val)
     return new StringExprEval(val == null ? null : String.valueOf(val));
   }
 
+  @Nullable
+  public static Number computeNumber(@Nullable String value)

Review comment:
       Maybe, but I think I would rather save this until a follow-up PR when we look into consolidating these redundant methods re: the other comment

##########
File path: core/src/main/java/org/apache/druid/math/expr/BinaryLogicalOperatorExpr.java
##########
@@ -68,6 +70,17 @@ public ExprType getOutputType(InputBindingTypes inputTypes)
     }
     return implicitCast;
   }
+  @Override
+  public boolean canVectorize(InputBindingTypes inputTypes)
+  {
+    return inputTypes.areNumeric(left, right) && inputTypes.canVectorize(left, right);

Review comment:
       Yeah, I was going to clean this up once all operators are implemented.

##########
File path: core/src/main/java/org/apache/druid/math/expr/Function.java
##########
@@ -517,6 +532,24 @@ public ExprEval apply(List<Expr> args, Expr.ObjectBinding bindings)
 
       return ExprEval.of(retVal);
     }
+
+    @Override
+    public boolean canVectorize(Expr.InputBindingTypes inputTypes, List<Expr> args)
+    {
+      return (args.size() == 1 || (args.get(1).isLiteral() && args.get(1).getLiteralValue() instanceof Number)) &&
+             inputTypes.canVectorize(args);
+    }
+
+    @Override
+    public <T> ExprVectorProcessor<T> asVectorProcessor(Expr.VectorInputBindingTypes inputTypes, List<Expr> args)
+    {
+      if (args.size() == 1 || args.get(1).isLiteral()) {
+        final int radix = args.size() == 1 ? 10 : ((Number) args.get(1).getLiteralValue()).intValue();
+        return VectorProcessors.parseLong(inputTypes, args.get(0), radix);
+      }
+      // not yet implemented, how did we get here

Review comment:
       I don't think it needs to be resolved in this PR. This comment just refers to that the variable input radix (second argument is not a constant) case is not implemented, but `canVectorize` will return false, so this line should not be reached.

##########
File path: core/src/main/java/org/apache/druid/math/expr/vector/UnivariateFunctionVectorProcessor.java
##########
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.math.expr.vector;
+
+import org.apache.druid.math.expr.Expr;
+
+/**
+ * common machinery for processing single input operators and functions, which should always treat null input as null
+ * output, and are backed by a primitive value instead of an object value (and need to use the null vector instead of
+ * checking the vector itself for nulls)

Review comment:
       Eh, is it necessary since strings vectors are object vectors? (and so are the array types, which I think will be handled by similar processors)

##########
File path: core/src/main/java/org/apache/druid/math/expr/vector/VectorProcessors.java
##########
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.math.expr.vector;
+
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprType;
+
+import javax.annotation.Nullable;
+import java.util.Arrays;
+
+public class VectorProcessors
+{
+  public static <T> ExprVectorProcessor<T> constantString(@Nullable String constant, int maxVectorSize)
+  {
+    final String[] strings = new String[maxVectorSize];
+    Arrays.fill(strings, constant);
+    final ExprEvalStringVector eval = new ExprEvalStringVector(strings);
+    return new ExprVectorProcessor<T>()
+    {
+      @Override
+      public ExprEvalVector<T> evalVector(Expr.VectorInputBinding bindings)
+      {
+        return (ExprEvalVector<T>) eval;
+      }
+
+      @Override
+      public ExprType getOutputType()
+      {
+        return ExprType.STRING;
+      }
+    };
+  }
+
+  public static <T> ExprVectorProcessor<T> constantDouble(@Nullable Double constant, int maxVectorSize)
+  {
+    final double[] doubles = new double[maxVectorSize];
+    final boolean[] nulls;
+    if (constant == null) {
+      nulls = new boolean[maxVectorSize];
+      Arrays.fill(nulls, true);
+    } else {
+      nulls = null;
+      Arrays.fill(doubles, constant);
+    }
+    final ExprEvalDoubleVector eval = new ExprEvalDoubleVector(doubles, nulls);
+    return new ExprVectorProcessor<T>()
+    {
+      @Override
+      public ExprEvalVector<T> evalVector(Expr.VectorInputBinding bindings)
+      {
+        return (ExprEvalVector<T>) eval;
+      }
+
+      @Override
+      public ExprType getOutputType()
+      {
+        return ExprType.DOUBLE;
+      }
+    };
+  }
+
+  public static <T> ExprVectorProcessor<T> constantLong(@Nullable Long constant, int maxVectorSize)
+  {
+    final long[] longs = new long[maxVectorSize];
+    final boolean[] nulls;
+    if (constant == null) {
+      nulls = new boolean[maxVectorSize];
+      Arrays.fill(nulls, true);
+    } else {
+      nulls = null;
+      Arrays.fill(longs, constant);
+    }
+    final ExprEvalLongVector eval = new ExprEvalLongVector(longs, nulls);
+    return new ExprVectorProcessor<T>()
+    {
+      @Override
+      public ExprEvalVector<T> evalVector(Expr.VectorInputBinding bindings)
+      {
+        return (ExprEvalVector<T>) eval;
+      }
+
+      @Override
+      public ExprType getOutputType()
+      {
+        return ExprType.LONG;
+      }
+    };
+  }
+
+  public static <T> ExprVectorProcessor<T> parseLong(Expr.VectorInputBindingTypes inputTypes, Expr arg, int radix)
+  {
+    final ExprVectorProcessor<?> processor = new LongOutStringInFunctionVectorProcessor(
+        CastToTypeVectorProcessor.castToType(arg.buildVectorized(inputTypes), ExprType.STRING),
+        inputTypes.getMaxVectorSize()
+    )
+    {
+      @Override
+      public void processIndex(String[] strings, long[] longs, boolean[] outputNulls, int i)
+      {
+        try {
+          longs[i] = Long.parseLong(strings[i], radix);
+          outputNulls[i] = false;

Review comment:
       Oops yes, I stuck the constant 2 argument version in last minute and forgot about this :+1:

##########
File path: processing/src/main/java/org/apache/druid/query/aggregation/AggregatorUtil.java
##########
@@ -225,6 +229,22 @@ public boolean isNull()
     }
   }
 
+  public static VectorValueSelector makeVectorValueSelector(
+      VectorColumnSelectorFactory columnSelectorFactory,
+      String fieldName,
+      String expression,

Review comment:
       added

##########
File path: processing/src/main/java/org/apache/druid/segment/virtual/ExpressionPlanner.java
##########
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.virtual;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Iterables;
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprType;
+import org.apache.druid.math.expr.Parser;
+import org.apache.druid.segment.ColumnInspector;
+import org.apache.druid.segment.column.ColumnCapabilities;
+import org.apache.druid.segment.column.ValueType;
+
+import java.util.EnumSet;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+public class ExpressionPlanner
+{
+  private ExpressionPlanner()
+  {
+    // No instantiation.
+  }
+
+  /**
+   * Druid tries to be chill to expressions to make up for not having a well defined table schema across segments. This
+   * method performs some analysis to determine what sort of selectors can be constructed on top of an expression,
+   * whether or not the expression will need implicitly mapped across multi-valued inputs, if the expression produces
+   * multi-valued outputs, is vectorizable, and everything else interesting when making a selector.
+   *
+   * Results are stored in a {@link ExpressionPlan}, which can be examined to do whatever is necessary to make things
+   * function properly.
+   */
+  public static ExpressionPlan plan(ColumnInspector inspector, Expr expression)
+  {
+    final Expr.BindingAnalysis analysis = expression.analyzeInputs();
+    Parser.validateExpr(expression, analysis);
+
+    EnumSet<ExpressionPlan.Trait> traits = EnumSet.noneOf(ExpressionPlan.Trait.class);
+    Set<String> maybeMultiValued = new HashSet<>();
+    List<String> needsApplied = ImmutableList.of();
+    ValueType singleInputType = null;
+    ExprType outputType = null;
+
+    final Set<String> columns = analysis.getRequiredBindings();
+
+    // check and set traits which allow optimized selectors to be created
+    if (columns.isEmpty()) {
+      traits.add(ExpressionPlan.Trait.CONSTANT);
+    } else if (columns.size() == 1) {
+      final String column = Iterables.getOnlyElement(columns);
+      final ColumnCapabilities capabilities = inspector.getColumnCapabilities(column);
+
+      // These flags allow for selectors that wrap a single underlying column to be optimized, through caching results
+      // and via allowing deferred execution in the case of building dimension selectors.
+      //    SINGLE_INPUT_SCALAR
+      // is set if an input is single valued, and the output is definitely single valued, with an additional requirement
+      // for strings that the column is dictionary encoded.
+      //    SINGLE_INPUT_MAPPABLE
+      // is set when a single input string column, which can be multi-valued, but if so, it must be implicitly mappable
+      // (i.e. the expression is not treating its input as an array and not wanting to output an array)
+      if (capabilities != null) {
+        boolean isSingleInputMappable = false;
+        boolean isSingleInputScalar = capabilities.hasMultipleValues().isFalse() &&
+                                      !analysis.hasInputArrays() &&
+                                      !analysis.isOutputArray();
+        if (capabilities.getType() == ValueType.STRING) {
+          isSingleInputScalar &= capabilities.isDictionaryEncoded().isTrue();
+          isSingleInputMappable = capabilities.isDictionaryEncoded().isTrue() &&
+                                  !capabilities.hasMultipleValues().isUnknown() &&

Review comment:
       No, this is what it was doing in the previous check, looking for either explicitly true or false, which are both ok, but I can't quite remember why unknown isn't ok... 
   
   But, this does match the previous logic in `ExpressionSelectors.makeDimensionSelector` for checking single column string inputs to determine if it can use `SingleStringInputDimensionSelector`, so I would rather not change it as part of this PR

##########
File path: benchmarks/src/test/java/org/apache/druid/benchmark/ExpressionVectorSelectorBenchmark.java
##########
@@ -0,0 +1,301 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.benchmark;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.java.util.common.granularity.Granularities;
+import org.apache.druid.java.util.common.guava.Sequence;
+import org.apache.druid.java.util.common.io.Closer;
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.ExprType;
+import org.apache.druid.math.expr.Parser;
+import org.apache.druid.query.expression.TestExprMacroTable;
+import org.apache.druid.segment.ColumnInspector;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.Cursor;
+import org.apache.druid.segment.QueryableIndex;
+import org.apache.druid.segment.QueryableIndexStorageAdapter;
+import org.apache.druid.segment.VirtualColumns;
+import org.apache.druid.segment.column.ColumnCapabilities;
+import org.apache.druid.segment.generator.GeneratorBasicSchemas;
+import org.apache.druid.segment.generator.GeneratorSchemaInfo;
+import org.apache.druid.segment.generator.SegmentGenerator;
+import org.apache.druid.segment.vector.VectorCursor;
+import org.apache.druid.segment.vector.VectorObjectSelector;
+import org.apache.druid.segment.vector.VectorValueSelector;
+import org.apache.druid.segment.virtual.ExpressionVirtualColumn;
+import org.apache.druid.timeline.DataSegment;
+import org.apache.druid.timeline.partition.LinearShardSpec;
+import org.junit.Assert;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Level;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.TearDown;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.infra.Blackhole;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+
+
+@State(Scope.Benchmark)
+@Fork(value = 1)
+@Warmup(iterations = 3)
+@Measurement(iterations = 5)
+@BenchmarkMode(Mode.AverageTime)
+@OutputTimeUnit(TimeUnit.MILLISECONDS)
+public class ExpressionVectorSelectorBenchmark
+{
+  static {
+    NullHandling.initializeForTests();
+  }
+
+  @Param({"1000000"})
+  private int rowsPerSegment;
+
+  @Param({"false", "true"})
+  private boolean vectorize;
+
+  @Param({
+      "long1 * long2",
+      "double1 * double3",
+      "float1 + float3",
+      "(long1 - long4) / double3",
+      "max(double3, double5)",
+      "min(double4, double1)",
+      "cos(float3)",
+      "sin(long4)",
+      "parse_long(string1)",
+      "parse_long(string1) * double3",
+      "parse_long(string5) * parse_long(string1)",
+      "parse_long(string5) * parse_long(string1) * double3"
+  })
+  private String expression;
+
+  private QueryableIndex index;
+  private Closer closer;
+
+  @Nullable
+  private ExprType outputType;
+
+  @Setup(Level.Trial)
+  public void setup()
+  {
+    this.closer = Closer.create();
+
+    final GeneratorSchemaInfo schemaInfo = GeneratorBasicSchemas.SCHEMA_MAP.get("expression-testbench");
+
+    final DataSegment dataSegment = DataSegment.builder()
+                                               .dataSource("foo")
+                                               .interval(schemaInfo.getDataInterval())
+                                               .version("1")
+                                               .shardSpec(new LinearShardSpec(0))
+                                               .size(0)
+                                               .build();
+
+    final SegmentGenerator segmentGenerator = closer.register(new SegmentGenerator());
+    this.index = closer.register(
+        segmentGenerator.generate(dataSegment, schemaInfo, Granularities.HOUR, rowsPerSegment)
+    );
+
+    Expr parsed = Parser.parse(expression, ExprMacroTable.nil());
+    outputType = parsed.getOutputType(
+        new ColumnInspector()
+        {
+          @Nullable
+          @Override
+          public ColumnCapabilities getColumnCapabilities(String column)
+          {
+            return QueryableIndexStorageAdapter.getColumnCapabilities(index, column);
+          }
+        }
+    );
+    checkSanity();
+  }
+
+  @TearDown(Level.Trial)
+  public void tearDown() throws Exception
+  {
+    closer.close();
+  }
+
+  @Benchmark
+  @BenchmarkMode(Mode.AverageTime)
+  @OutputTimeUnit(TimeUnit.MILLISECONDS)
+  public void scan(Blackhole blackhole)
+  {
+    final VirtualColumns virtualColumns = VirtualColumns.create(
+        ImmutableList.of(
+            new ExpressionVirtualColumn(
+                "v",
+                expression,
+                ExprType.toValueType(outputType),
+                TestExprMacroTable.INSTANCE
+            )
+        )
+    );
+    if (vectorize) {
+      VectorCursor cursor = new QueryableIndexStorageAdapter(index).makeVectorCursor(
+          null,
+          index.getDataInterval(),
+          virtualColumns,
+          false,
+          512,
+          null
+      );
+      if (outputType.isNumeric()) {
+        VectorValueSelector selector = cursor.getColumnSelectorFactory().makeValueSelector("v");
+        if (outputType.equals(ExprType.DOUBLE)) {
+          while (!cursor.isDone()) {
+            blackhole.consume(selector.getDoubleVector());
+            blackhole.consume(selector.getNullVector());
+            cursor.advance();
+          }
+        } else {
+          while (!cursor.isDone()) {
+            blackhole.consume(selector.getLongVector());
+            blackhole.consume(selector.getNullVector());
+            cursor.advance();
+          }
+        }
+        closer.register(cursor);
+      }
+    } else {
+      Sequence<Cursor> cursors = new QueryableIndexStorageAdapter(index).makeCursors(
+          null,
+          index.getDataInterval(),
+          virtualColumns,
+          Granularities.ALL,
+          false,
+          null
+      );
+
+      int rowCount = cursors
+          .map(cursor -> {
+            final ColumnValueSelector selector = cursor.getColumnSelectorFactory().makeColumnValueSelector("v");
+            int rows = 0;
+            while (!cursor.isDone()) {
+              blackhole.consume(selector.getObject());
+              rows++;
+              cursor.advance();
+            }
+            return rows;
+          }).accumulate(0, (acc, in) -> acc + in);
+
+      blackhole.consume(rowCount);
+    }
+  }
+
+  private void checkSanity()
+  {
+    final List<Object> results = new ArrayList<>(rowsPerSegment);
+    final VirtualColumns virtualColumns = VirtualColumns.create(
+        ImmutableList.of(
+            new ExpressionVirtualColumn(
+                "v",
+                expression,
+                ExprType.toValueType(outputType),
+                TestExprMacroTable.INSTANCE
+            )
+        )
+    );
+    VectorCursor cursor = new QueryableIndexStorageAdapter(index).makeVectorCursor(
+        null,
+        index.getDataInterval(),
+        virtualColumns,
+        false,
+        512,
+        null
+    );
+
+    VectorValueSelector selector = null;
+    VectorObjectSelector objectSelector = null;
+    if (outputType.isNumeric()) {
+      selector = cursor.getColumnSelectorFactory().makeValueSelector("v");
+    } else {
+      objectSelector = cursor.getColumnSelectorFactory().makeObjectSelector("v");
+    }
+    int rowCount = 0;
+    while (!cursor.isDone()) {
+      boolean[] nulls;
+      switch (outputType) {
+        case LONG:
+          nulls = selector.getNullVector();
+          long[] longs = selector.getLongVector();
+          for (int i = 0; i < selector.getCurrentVectorSize(); i++, rowCount++) {
+            results.add(nulls != null && nulls[i] ? null : longs[i]);
+          }
+          break;
+        case DOUBLE:
+          nulls = selector.getNullVector();
+          double[] doubles = selector.getDoubleVector();
+          for (int i = 0; i < selector.getCurrentVectorSize(); i++, rowCount++) {
+            results.add(nulls != null && nulls[i] ? null : doubles[i]);
+          }
+          break;
+        case STRING:
+          Object[] objects = objectSelector.getObjectVector();
+          for (int i = 0; i < objectSelector.getCurrentVectorSize(); i++, rowCount++) {
+            results.add(objects[i]);
+          }
+          break;
+      }
+
+      cursor.advance();
+    }
+    closer.register(cursor);
+
+    Sequence<Cursor> cursors = new QueryableIndexStorageAdapter(index).makeCursors(
+        null,
+        index.getDataInterval(),
+        virtualColumns,
+        Granularities.ALL,
+        false,
+        null
+    );
+
+    int rowCountCursor = cursors
+        .map(nonVectorized -> {
+          final ColumnValueSelector nonSelector = nonVectorized.getColumnSelectorFactory().makeColumnValueSelector("v");
+          int rows = 0;
+          while (!nonVectorized.isDone()) {
+            Assert.assertEquals(StringUtils.format("Failed at row %s", rows), nonSelector.getObject(), results.get(rows));
+            rows++;
+            nonVectorized.advance();
+          }
+          return rows;
+        }).accumulate(0, (acc, in) -> acc + in);
+
+    Assert.assertTrue(rowCountCursor > 0);

Review comment:
       oops yes, added, but moved this entire check into a newly added `ExpressionVectorSelectorsTest`.

##########
File path: benchmarks/src/test/java/org/apache/druid/benchmark/ExpressionVectorSelectorBenchmark.java
##########
@@ -0,0 +1,301 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.benchmark;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.java.util.common.granularity.Granularities;
+import org.apache.druid.java.util.common.guava.Sequence;
+import org.apache.druid.java.util.common.io.Closer;
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.ExprType;
+import org.apache.druid.math.expr.Parser;
+import org.apache.druid.query.expression.TestExprMacroTable;
+import org.apache.druid.segment.ColumnInspector;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.Cursor;
+import org.apache.druid.segment.QueryableIndex;
+import org.apache.druid.segment.QueryableIndexStorageAdapter;
+import org.apache.druid.segment.VirtualColumns;
+import org.apache.druid.segment.column.ColumnCapabilities;
+import org.apache.druid.segment.generator.GeneratorBasicSchemas;
+import org.apache.druid.segment.generator.GeneratorSchemaInfo;
+import org.apache.druid.segment.generator.SegmentGenerator;
+import org.apache.druid.segment.vector.VectorCursor;
+import org.apache.druid.segment.vector.VectorObjectSelector;
+import org.apache.druid.segment.vector.VectorValueSelector;
+import org.apache.druid.segment.virtual.ExpressionVirtualColumn;
+import org.apache.druid.timeline.DataSegment;
+import org.apache.druid.timeline.partition.LinearShardSpec;
+import org.junit.Assert;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Level;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.TearDown;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.infra.Blackhole;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+
+
+@State(Scope.Benchmark)
+@Fork(value = 1)
+@Warmup(iterations = 3)
+@Measurement(iterations = 5)
+@BenchmarkMode(Mode.AverageTime)
+@OutputTimeUnit(TimeUnit.MILLISECONDS)
+public class ExpressionVectorSelectorBenchmark
+{
+  static {
+    NullHandling.initializeForTests();
+  }
+
+  @Param({"1000000"})
+  private int rowsPerSegment;
+
+  @Param({"false", "true"})
+  private boolean vectorize;
+
+  @Param({
+      "long1 * long2",
+      "double1 * double3",
+      "float1 + float3",
+      "(long1 - long4) / double3",
+      "max(double3, double5)",
+      "min(double4, double1)",
+      "cos(float3)",
+      "sin(long4)",
+      "parse_long(string1)",
+      "parse_long(string1) * double3",
+      "parse_long(string5) * parse_long(string1)",
+      "parse_long(string5) * parse_long(string1) * double3"
+  })
+  private String expression;
+
+  private QueryableIndex index;
+  private Closer closer;
+
+  @Nullable
+  private ExprType outputType;
+
+  @Setup(Level.Trial)
+  public void setup()
+  {
+    this.closer = Closer.create();
+
+    final GeneratorSchemaInfo schemaInfo = GeneratorBasicSchemas.SCHEMA_MAP.get("expression-testbench");
+
+    final DataSegment dataSegment = DataSegment.builder()
+                                               .dataSource("foo")
+                                               .interval(schemaInfo.getDataInterval())
+                                               .version("1")
+                                               .shardSpec(new LinearShardSpec(0))
+                                               .size(0)
+                                               .build();
+
+    final SegmentGenerator segmentGenerator = closer.register(new SegmentGenerator());
+    this.index = closer.register(
+        segmentGenerator.generate(dataSegment, schemaInfo, Granularities.HOUR, rowsPerSegment)
+    );
+
+    Expr parsed = Parser.parse(expression, ExprMacroTable.nil());
+    outputType = parsed.getOutputType(
+        new ColumnInspector()
+        {
+          @Nullable
+          @Override
+          public ColumnCapabilities getColumnCapabilities(String column)
+          {
+            return QueryableIndexStorageAdapter.getColumnCapabilities(index, column);
+          }
+        }
+    );
+    checkSanity();
+  }
+
+  @TearDown(Level.Trial)
+  public void tearDown() throws Exception
+  {
+    closer.close();
+  }
+
+  @Benchmark
+  @BenchmarkMode(Mode.AverageTime)
+  @OutputTimeUnit(TimeUnit.MILLISECONDS)
+  public void scan(Blackhole blackhole)
+  {
+    final VirtualColumns virtualColumns = VirtualColumns.create(
+        ImmutableList.of(
+            new ExpressionVirtualColumn(
+                "v",
+                expression,
+                ExprType.toValueType(outputType),
+                TestExprMacroTable.INSTANCE
+            )
+        )
+    );
+    if (vectorize) {
+      VectorCursor cursor = new QueryableIndexStorageAdapter(index).makeVectorCursor(
+          null,
+          index.getDataInterval(),
+          virtualColumns,
+          false,
+          512,
+          null
+      );
+      if (outputType.isNumeric()) {
+        VectorValueSelector selector = cursor.getColumnSelectorFactory().makeValueSelector("v");
+        if (outputType.equals(ExprType.DOUBLE)) {
+          while (!cursor.isDone()) {
+            blackhole.consume(selector.getDoubleVector());
+            blackhole.consume(selector.getNullVector());
+            cursor.advance();
+          }
+        } else {
+          while (!cursor.isDone()) {
+            blackhole.consume(selector.getLongVector());
+            blackhole.consume(selector.getNullVector());
+            cursor.advance();
+          }
+        }
+        closer.register(cursor);
+      }
+    } else {
+      Sequence<Cursor> cursors = new QueryableIndexStorageAdapter(index).makeCursors(
+          null,
+          index.getDataInterval(),
+          virtualColumns,
+          Granularities.ALL,
+          false,
+          null
+      );
+
+      int rowCount = cursors
+          .map(cursor -> {
+            final ColumnValueSelector selector = cursor.getColumnSelectorFactory().makeColumnValueSelector("v");
+            int rows = 0;
+            while (!cursor.isDone()) {
+              blackhole.consume(selector.getObject());
+              rows++;
+              cursor.advance();
+            }
+            return rows;
+          }).accumulate(0, (acc, in) -> acc + in);
+
+      blackhole.consume(rowCount);
+    }
+  }
+
+  private void checkSanity()

Review comment:
       added new test `ExpressionVectorSelectorsTest`

##########
File path: benchmarks/src/test/java/org/apache/druid/benchmark/ExpressionVectorSelectorBenchmark.java
##########
@@ -0,0 +1,301 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.benchmark;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.java.util.common.granularity.Granularities;
+import org.apache.druid.java.util.common.guava.Sequence;
+import org.apache.druid.java.util.common.io.Closer;
+import org.apache.druid.math.expr.Expr;
+import org.apache.druid.math.expr.ExprMacroTable;
+import org.apache.druid.math.expr.ExprType;
+import org.apache.druid.math.expr.Parser;
+import org.apache.druid.query.expression.TestExprMacroTable;
+import org.apache.druid.segment.ColumnInspector;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.Cursor;
+import org.apache.druid.segment.QueryableIndex;
+import org.apache.druid.segment.QueryableIndexStorageAdapter;
+import org.apache.druid.segment.VirtualColumns;
+import org.apache.druid.segment.column.ColumnCapabilities;
+import org.apache.druid.segment.generator.GeneratorBasicSchemas;
+import org.apache.druid.segment.generator.GeneratorSchemaInfo;
+import org.apache.druid.segment.generator.SegmentGenerator;
+import org.apache.druid.segment.vector.VectorCursor;
+import org.apache.druid.segment.vector.VectorObjectSelector;
+import org.apache.druid.segment.vector.VectorValueSelector;
+import org.apache.druid.segment.virtual.ExpressionVirtualColumn;
+import org.apache.druid.timeline.DataSegment;
+import org.apache.druid.timeline.partition.LinearShardSpec;
+import org.junit.Assert;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Level;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.TearDown;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.infra.Blackhole;
+
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+
+
+@State(Scope.Benchmark)
+@Fork(value = 1)
+@Warmup(iterations = 3)
+@Measurement(iterations = 5)

Review comment:
       Ah probably for super accurate results if I were making plots, but this was close enough for ballpark measurements while testing changes since the timing usually seemed to settle down after 2 warmup iterations.

##########
File path: benchmarks/src/test/java/org/apache/druid/benchmark/query/SqlExpressionBenchmark.java
##########
@@ -0,0 +1,363 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.benchmark.query;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.ImmutableMap;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.druid.common.config.NullHandling;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.java.util.common.granularity.Granularities;
+import org.apache.druid.java.util.common.guava.Sequence;
+import org.apache.druid.java.util.common.guava.Yielder;
+import org.apache.druid.java.util.common.guava.Yielders;
+import org.apache.druid.java.util.common.io.Closer;
+import org.apache.druid.java.util.common.logger.Logger;
+import org.apache.druid.query.DruidProcessingConfig;
+import org.apache.druid.query.QueryRunnerFactoryConglomerate;
+import org.apache.druid.segment.QueryableIndex;
+import org.apache.druid.segment.generator.GeneratorBasicSchemas;
+import org.apache.druid.segment.generator.GeneratorSchemaInfo;
+import org.apache.druid.segment.generator.SegmentGenerator;
+import org.apache.druid.server.QueryStackTests;
+import org.apache.druid.server.security.AuthTestUtils;
+import org.apache.druid.server.security.AuthenticationResult;
+import org.apache.druid.server.security.NoopEscalator;
+import org.apache.druid.sql.calcite.planner.Calcites;
+import org.apache.druid.sql.calcite.planner.DruidPlanner;
+import org.apache.druid.sql.calcite.planner.PlannerConfig;
+import org.apache.druid.sql.calcite.planner.PlannerFactory;
+import org.apache.druid.sql.calcite.planner.PlannerResult;
+import org.apache.druid.sql.calcite.util.CalciteTests;
+import org.apache.druid.sql.calcite.util.SpecificSegmentsQuerySegmentWalker;
+import org.apache.druid.timeline.DataSegment;
+import org.apache.druid.timeline.partition.LinearShardSpec;
+import org.junit.Assert;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Level;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.TearDown;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.infra.Blackhole;
+
+import javax.annotation.Nullable;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Benchmark that tests various SQL queries.
+ */
+@State(Scope.Benchmark)
+@Fork(value = 1)
+@Warmup(iterations = 3)
+@Measurement(iterations = 5)
+public class SqlExpressionBenchmark
+{
+  private static final Logger log = new Logger(SqlExpressionBenchmark.class);
+
+  static {
+    NullHandling.initializeForTests();
+    Calcites.setSystemProperties();
+  }
+
+  private static final DruidProcessingConfig PROCESSING_CONFIG = new DruidProcessingConfig()
+  {
+    @Override
+    public int intermediateComputeSizeBytes()
+    {
+      return 512 * 1024 * 1024;
+    }
+
+    @Override
+    public int getNumMergeBuffers()
+    {
+      return 3;
+    }
+
+    @Override
+    public int getNumThreads()
+    {
+      return 1;
+    }
+
+    @Override
+    public boolean useParallelMergePoolConfigured()
+    {
+      return true;
+    }
+
+    @Override
+    public String getFormatString()
+    {
+      return "benchmarks-processing-%s";
+    }
+  };
+
+
+  private static final List<String> QUERIES = ImmutableList.of(
+      // ===========================
+      // non-expression reference queries
+      // ===========================
+      // 0: non-expression timeseries reference, 1 columns
+      "SELECT SUM(long1) FROM foo",
+      // 1: non-expression timeseries reference, 2 columns
+      "SELECT SUM(long1), SUM(long2) FROM foo",
+      // 2: non-expression timeseries reference, 3 columns
+      "SELECT SUM(long1), SUM(long4), SUM(double1) FROM foo",
+      // 3: non-expression timeseries reference, 4 columns
+      "SELECT SUM(long1), SUM(long4), SUM(double1), SUM(float3) FROM foo",
+      // 4: non-expression timeseries reference, 5 columns
+      "SELECT SUM(long1), SUM(long4), SUM(double1), SUM(float3), SUM(long5) FROM foo",
+      // 5: group by non-expr with 1 agg
+      "SELECT string2, SUM(long1) FROM foo GROUP BY 1 ORDER BY 2",
+      // 6: group by non-expr with 2 agg
+      "SELECT string2, SUM(long1), SUM(double3) FROM foo GROUP BY 1 ORDER BY 2",
+      // ===========================
+      // expressions
+      // ===========================
+      // 7: math op - 2 longs
+      "SELECT SUM(long1 * long2) FROM foo",
+      // 8: mixed math - 2 longs, 1 double
+      "SELECT SUM((long1 * long2) / double1) FROM foo",
+      // 9: mixed math - 2 longs, 1 double, 1 float
+      "SELECT SUM(float3 + ((long1 * long4)/double1)) FROM foo",
+      // 10: mixed math - 3 longs, 1 double, 1 float
+      "SELECT SUM(long5 - (float3 + ((long1 * long4)/double1))) FROM foo",
+      // 11: all same math op - 3 longs, 1 double, 1 float
+      "SELECT SUM(long5 * float3 * long1 * long4 * double1) FROM foo",
+      // 12: cos
+      "SELECT cos(double2) FROM foo",
+      // 13: unary negate
+      "SELECT SUM(-long4) FROM foo",
+      // 14: string long
+      "SELECT SUM(PARSE_LONG(string1)) FROM foo",
+      // 15: string longer
+      "SELECT SUM(PARSE_LONG(string3)) FROM foo",
+      // 16: time floor, non-expr col + reg agg
+      "SELECT TIME_FLOOR(__time, 'PT1H'), string2, SUM(double4) FROM foo GROUP BY 1,2 ORDER BY 3",
+      // 17: time floor, non-expr col + expr agg
+      "SELECT TIME_FLOOR(__time, 'PT1H'), string2, SUM(long1 * double4) FROM foo GROUP BY 1,2 ORDER BY 3",
+      // 18: time floor + non-expr agg (timeseries) (non-expression reference)
+      "SELECT TIME_FLOOR(__time, 'PT1H'), SUM(long1) FROM foo GROUP BY 1 ORDER BY 1",
+      // 19: time floor + expr agg (timeseries)
+      "SELECT TIME_FLOOR(__time, 'PT1H'), SUM(long1 * long4) FROM foo GROUP BY 1 ORDER BY 1",
+      // 20: time floor + non-expr agg (group by)
+      "SELECT TIME_FLOOR(__time, 'PT1H'), SUM(long1) FROM foo GROUP BY 1 ORDER BY 2",
+      // 21: time floor + expr agg (group by)
+      "SELECT TIME_FLOOR(__time, 'PT1H'), SUM(long1 * long4) FROM foo GROUP BY 1 ORDER BY 2",
+      // 22: time floor offset by 1 day + non-expr agg (group by)
+      "SELECT TIME_FLOOR(TIMESTAMPADD(DAY, -1, __time), 'PT1H'), SUM(long1) FROM foo GROUP BY 1 ORDER BY 1",
+      // 23: time floor offset by 1 day + expr agg (group by)
+      "SELECT TIME_FLOOR(TIMESTAMPADD(DAY, -1, __time), 'PT1H'), SUM(long1 * long4) FROM foo GROUP BY 1 ORDER BY 1",
+      // 24: group by long expr with non-expr agg
+      "SELECT (long1 * long2), SUM(double1) FROM foo GROUP BY 1 ORDER BY 2",
+      // 25: group by non-expr with expr agg
+      "SELECT string2, SUM(long1 * long4) FROM foo GROUP BY 1 ORDER BY 2"
+  );
+
+  @Param({"5000000"})
+  private int rowsPerSegment;
+
+  @Param({"false", "force"})
+  private String vectorize;
+
+  @Param({
+      // non-expression reference
+      "0",
+      "1",
+      "2",
+      "3",
+      "4",
+      "5",
+      "6",
+      // expressions
+      "7",
+      "8",
+      "9",
+      "10",
+      "11",
+      "12",
+      "13",
+      "14",
+      "15",
+      "16",
+      "17",
+      "18",
+      "19",
+      "20",
+      "21",
+      "22",
+      "23",
+      "24",
+      "25"
+  })
+  private String query;
+
+  @Nullable
+  private PlannerFactory plannerFactory;
+  private Closer closer = Closer.create();
+
+  @Setup(Level.Trial)
+  public void setup() throws Exception
+  {
+    final GeneratorSchemaInfo schemaInfo = GeneratorBasicSchemas.SCHEMA_MAP.get("expression-testbench");
+
+    final DataSegment dataSegment = DataSegment.builder()
+                                               .dataSource("foo")
+                                               .interval(schemaInfo.getDataInterval())
+                                               .version("1")
+                                               .shardSpec(new LinearShardSpec(0))
+                                               .size(0)
+                                               .build();
+
+    final PlannerConfig plannerConfig = new PlannerConfig();
+
+    final SegmentGenerator segmentGenerator = closer.register(new SegmentGenerator());
+    log.info("Starting benchmark setup using cacheDir[%s], rows[%,d].", segmentGenerator.getCacheDir(), rowsPerSegment);
+    final QueryableIndex index = segmentGenerator.generate(dataSegment, schemaInfo, Granularities.NONE, rowsPerSegment);
+
+    final QueryRunnerFactoryConglomerate conglomerate = QueryStackTests.createQueryRunnerFactoryConglomerate(
+        closer,
+        PROCESSING_CONFIG
+    );
+
+    final SpecificSegmentsQuerySegmentWalker walker = new SpecificSegmentsQuerySegmentWalker(conglomerate).add(
+        dataSegment,
+        index
+    );
+    closer.register(walker);
+
+    final SchemaPlus rootSchema =
+        CalciteTests.createMockRootSchema(conglomerate, walker, plannerConfig, AuthTestUtils.TEST_AUTHORIZER_MAPPER);
+    plannerFactory = new PlannerFactory(
+        rootSchema,
+        CalciteTests.createMockQueryLifecycleFactory(walker, conglomerate),
+        CalciteTests.createOperatorTable(),
+        CalciteTests.createExprMacroTable(),
+        plannerConfig,
+        AuthTestUtils.TEST_AUTHORIZER_MAPPER,
+        CalciteTests.getJsonMapper(),
+        CalciteTests.DRUID_SCHEMA_NAME
+    );
+
+    checkSanity();
+  }
+
+  @TearDown(Level.Trial)
+  public void tearDown() throws Exception
+  {
+    closer.close();
+  }
+
+  @Benchmark
+  @BenchmarkMode(Mode.AverageTime)
+  @OutputTimeUnit(TimeUnit.MILLISECONDS)
+  public void querySql(Blackhole blackhole) throws Exception
+  {
+    final Map<String, Object> context = ImmutableMap.of("vectorize", vectorize);
+    final AuthenticationResult authenticationResult = NoopEscalator.getInstance()
+                                                                   .createEscalatedAuthenticationResult();
+    try (final DruidPlanner planner = plannerFactory.createPlanner(context, ImmutableList.of(), authenticationResult)) {
+      final PlannerResult plannerResult = planner.plan(QUERIES.get(Integer.parseInt(query)));
+      final Sequence<Object[]> resultSequence = plannerResult.run();
+      final Object[] lastRow = resultSequence.accumulate(null, (accumulated, in) -> in);
+      blackhole.consume(lastRow);
+    }
+  }
+
+  public void checkSanity() throws Exception

Review comment:
       It seemed like too much work to add this to `CalciteQueryTest`, but I did add a new `SqlVectorizedExpressionSanityTest` which does some of these query tests as part of CI, and migrated this logic there so that this benchmark can call into the test method.

##########
File path: processing/src/main/java/org/apache/druid/segment/virtual/ExpressionVirtualColumn.java
##########
@@ -129,13 +139,74 @@ public DimensionSelector makeDimensionSelector(
     return ExpressionSelectors.makeColumnValueSelector(factory, parsedExpression.get());
   }
 
+  @Override
+  public boolean canVectorize(ColumnInspector inspector)
+  {
+    final ExpressionPlan plan = ExpressionPlanner.plan(inspector, parsedExpression.get());
+    return plan.is(ExpressionPlan.Trait.VECTORIZABLE);
+  }
+
+  @Override
+  public VectorValueSelector makeVectorValueSelector(String columnName, VectorColumnSelectorFactory factory)
+  {
+    return ExpressionVectorSelectors.makeVectorValueSelector(factory, parsedExpression.get());
+  }
+
+  @Override
+  public VectorObjectSelector makeVectorObjectSelector(String columnName, VectorColumnSelectorFactory factory)
+  {
+    return ExpressionVectorSelectors.makeVectorObjectSelector(factory, parsedExpression.get());
+  }
+
   @Override
   public ColumnCapabilities capabilities(String columnName)
   {
-    // Note: Ideally we would fill out additional information instead of leaving capabilities as 'unknown', e.g. examine
-    // if the expression in question could potentially return multiple values and anything else. However, we don't
-    // currently have a good way of determining this, so fill this out more once we do
-    return new ColumnCapabilitiesImpl().setType(outputType);
+    // If possible, this should only be used as a fallback method for when capabilities are truly 'unknown', because we
+    // are unable to compute the output type of the expression, either due to incomplete type information of the
+    // inputs or because of unimplemented methods on expression implementations themselves, or, because a
+    // ColumnInspector is not available
+    return new ColumnCapabilitiesImpl().setType(outputType == null ? ValueType.FLOAT : outputType);
+  }
+
+  @Override
+  public ColumnCapabilities capabilities(ColumnInspector inspector, String columnName)
+  {
+    final ExpressionPlan plan = ExpressionPlanner.plan(inspector, parsedExpression.get());
+
+    if (plan.getOutputType() != null) {
+
+      if (outputType != null && ExprType.fromValueType(outputType) != plan.getOutputType()) {
+        log.warn(
+            "Projected output type %s of expression %s does not match provided type %s",
+            plan.getOutputType(),
+            expression,
+            outputType
+        );
+      }
+      final ExprType inferredOutputType = plan.getOutputType();
+      final ValueType valueType = ExprType.toValueType(inferredOutputType);
+      if (valueType.isNumeric()) {
+        // if float was explicitly specified preserve it, because it will currently never be the computed output type
+        if (ValueType.FLOAT == outputType) {
+          return ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.FLOAT);
+        }
+        return ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ExprType.toValueType(inferredOutputType));

Review comment:
       fixed

##########
File path: processing/src/main/java/org/apache/druid/segment/virtual/ExpressionVirtualColumn.java
##########
@@ -129,13 +139,74 @@ public DimensionSelector makeDimensionSelector(
     return ExpressionSelectors.makeColumnValueSelector(factory, parsedExpression.get());
   }
 
+  @Override
+  public boolean canVectorize(ColumnInspector inspector)
+  {
+    final ExpressionPlan plan = ExpressionPlanner.plan(inspector, parsedExpression.get());
+    return plan.is(ExpressionPlan.Trait.VECTORIZABLE);
+  }
+
+  @Override
+  public VectorValueSelector makeVectorValueSelector(String columnName, VectorColumnSelectorFactory factory)
+  {
+    return ExpressionVectorSelectors.makeVectorValueSelector(factory, parsedExpression.get());
+  }
+
+  @Override
+  public VectorObjectSelector makeVectorObjectSelector(String columnName, VectorColumnSelectorFactory factory)
+  {
+    return ExpressionVectorSelectors.makeVectorObjectSelector(factory, parsedExpression.get());
+  }
+
   @Override
   public ColumnCapabilities capabilities(String columnName)
   {
-    // Note: Ideally we would fill out additional information instead of leaving capabilities as 'unknown', e.g. examine
-    // if the expression in question could potentially return multiple values and anything else. However, we don't
-    // currently have a good way of determining this, so fill this out more once we do
-    return new ColumnCapabilitiesImpl().setType(outputType);
+    // If possible, this should only be used as a fallback method for when capabilities are truly 'unknown', because we
+    // are unable to compute the output type of the expression, either due to incomplete type information of the
+    // inputs or because of unimplemented methods on expression implementations themselves, or, because a
+    // ColumnInspector is not available
+    return new ColumnCapabilitiesImpl().setType(outputType == null ? ValueType.FLOAT : outputType);
+  }
+
+  @Override
+  public ColumnCapabilities capabilities(ColumnInspector inspector, String columnName)
+  {
+    final ExpressionPlan plan = ExpressionPlanner.plan(inspector, parsedExpression.get());
+
+    if (plan.getOutputType() != null) {
+
+      if (outputType != null && ExprType.fromValueType(outputType) != plan.getOutputType()) {
+        log.warn(
+            "Projected output type %s of expression %s does not match provided type %s",
+            plan.getOutputType(),
+            expression,
+            outputType
+        );
+      }
+      final ExprType inferredOutputType = plan.getOutputType();
+      final ValueType valueType = ExprType.toValueType(inferredOutputType);
+      if (valueType.isNumeric()) {
+        // if float was explicitly specified preserve it, because it will currently never be the computed output type
+        if (ValueType.FLOAT == outputType) {
+          return ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.FLOAT);
+        }
+        return ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ExprType.toValueType(inferredOutputType));
+      }
+
+      // we don't have to check for unknown input here because output type is unable to be inferred if we don't know
+      // the complete set of input types
+      if (plan.any(ExpressionPlan.Trait.NON_SCALAR_OUTPUT, ExpressionPlan.Trait.NEEDS_APPLIED)) {

Review comment:
       Ah, unknown is effectively true in most cases (except for `SINGLE_INPUT_MAPPABLE`), but will change since both of these do imply multiple values

##########
File path: processing/src/main/java/org/apache/druid/segment/VirtualColumn.java
##########
@@ -246,6 +246,12 @@ default VectorObjectSelector makeVectorObjectSelector(
    */
   ColumnCapabilities capabilities(String columnName);
 
+
+  default ColumnCapabilities capabilities(ColumnInspector inspector, String columnName)

Review comment:
       I was initially not planning to deprecate for virtual columns that don't care about the other columns in the segment, but I went ahead and did it, since they can just ignore the `ColumnInspector`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org