You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "imply-cheddar (via GitHub)" <gi...@apache.org> on 2023/03/08 05:17:20 UTC

[GitHub] [druid] imply-cheddar commented on a diff in pull request #13686: Integrate the catalog with the Calcite planner

imply-cheddar commented on code in PR #13686:
URL: https://github.com/apache/druid/pull/13686#discussion_r1128931261


##########
server/src/main/java/org/apache/druid/catalog/model/ColumnSpec.java:
##########
@@ -108,7 +108,7 @@ public void validate()
     if (Strings.isNullOrEmpty(name)) {
       throw new IAE("Column name is required");
     }
-    // Validate type in the next PR
+    TypeParser.parse(sqlType);

Review Comment:
   The comment that previously existed makes me believe that the purpose of this line of code is to validate the sqlType.  Without that comment, I would look at this code and believe that it is a spurious thing that is doing nothing (it's called parse and yet doesn't keep the return value).  Please name it differently (validate?) or do something with the return value so that we can actually know that this code is attempting to validate something.



##########
server/src/main/java/org/apache/druid/catalog/model/MeasureTypes.java:
##########
@@ -0,0 +1,219 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.catalog.model;
+
+import com.google.common.base.Strings;
+import com.google.common.collect.ImmutableMap;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.java.util.common.StringUtils;
+import org.apache.druid.segment.column.ColumnType;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import java.util.stream.Collectors;
+
+public class MeasureTypes
+{
+  public static final String OPTIONAL = "--";
+
+  public enum BaseType
+  {
+    VARCHAR(ColumnType.STRING),
+    BIGINT(ColumnType.LONG),
+    FLOAT(ColumnType.FLOAT),
+    DOUBLE(ColumnType.DOUBLE);
+
+    public final ColumnType nativeType;
+
+    BaseType(ColumnType nativeType)
+    {
+      this.nativeType = nativeType;
+    }
+  }
+
+  public static class MeasureType
+  {
+    public final String name;
+    public final List<BaseType> argTypes;
+    public final String sqlSeedFn;
+    public final String sqlReducerFn;
+    public final String nativeType;
+    public final String nativeAggFn;
+    public final ColumnType storageType;
+    public final String nativeReducerFn;
+
+    public MeasureType(
+        final String name,
+        final List<BaseType> argTypes,
+        final String sqlSeedFn,
+        final String sqlReducerFn,
+        final String nativeType,
+        final ColumnType storageType,
+        final String nativeAggFn,
+        final String nativeReducerFn
+    )
+    {
+      this.name = name;
+      this.argTypes = argTypes == null ? Collections.emptyList() : argTypes;
+      this.sqlSeedFn = sqlSeedFn;
+      this.sqlReducerFn = sqlReducerFn;
+      this.nativeType = nativeType;
+      this.storageType = storageType;
+      this.nativeAggFn = nativeAggFn;
+      this.nativeReducerFn = nativeReducerFn;
+    }
+
+    @Override
+    public String toString()
+    {
+      StringBuilder buf = new StringBuilder()
+          .append(name)
+          .append("(");
+      for (int i = 0; i < argTypes.size(); i++) {
+        if (i > 0) {
+          buf.append(", ");
+        }
+        buf.append(argTypes.get(i).name());
+      }
+      return buf.append(")").toString();
+    }
+  }
+
+  // See: https://druid.apache.org/docs/latest/querying/aggregations.html
+  public static final MeasureType COUNT_TYPE = new MeasureType(
+      "COUNT",
+      null,
+      null,
+      "SUM",
+      "longSum",
+      ColumnType.LONG,
+      "count",
+      "longSum"
+  );
+
+  public static final MeasureType SUM_BIGINT_TYPE = simpleAggType("sum", BaseType.BIGINT);
+  public static final MeasureType SUM_FLOAT_TYPE = simpleAggType("sum", BaseType.FLOAT);
+  public static final MeasureType SUM_DOUBLE_TYPE = simpleAggType("sum", BaseType.DOUBLE);
+  public static final MeasureType MIN_BIGINT_TYPE = simpleAggType("min", BaseType.BIGINT);
+  public static final MeasureType MIN_FLOAT_TYPE = simpleAggType("min", BaseType.FLOAT);
+  public static final MeasureType MIN_DOUBLE_TYPE = simpleAggType("min", BaseType.DOUBLE);
+  public static final MeasureType MAX_BIGINT_TYPE = simpleAggType("max", BaseType.BIGINT);
+  public static final MeasureType MAX_FLOAT_TYPE = simpleAggType("max", BaseType.FLOAT);
+  public static final MeasureType MAX_DOUBLE_TYPE = simpleAggType("max", BaseType.DOUBLE);

Review Comment:
   There are significantly more measures possible than this list.  I'm assuming that this is just a starting point.  Given that new aggregators can be added by extensions, the registration of which kinds of aggregators turn into which types of things should likely be done by the same mechanism that registers the native aggregator with SQL in the first place.
   
   If we want to merge the code as is with this limited list of aggs just to get the bones in place, I'm fine with that.  But, we should note that the catalog won't actually be able to support management of rolled up schemas until we have added a mechanism at the attach-point for native into SQL.



##########
server/src/main/java/org/apache/druid/catalog/model/table/ExternalTableDefn.java:
##########
@@ -282,7 +288,14 @@ public TableFunction tableFn(ResolvedTable table)
   @Override
   protected void validateColumn(ColumnSpec colSpec)
   {
-    // Validate type in next PR
+    ParsedType type = TypeParser.parse(colSpec.sqlType());
+    if (type.kind() == ParsedType.Kind.MEASURE) {
+      throw new IAE(
+          "External column %s cannot use measure SQL type %s",
+          colSpec.name(),
+          colSpec.sqlType()
+      );
+    }

Review Comment:
   Please interpolate more about how the thing was actually parsed:
   
   ```
   "External column[%s] of type[%s] parsed to a MEASURE type[%s], which is not allowed", colSpec.name(), colSpec.sqlType(), type
   ```



##########
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidCatalogReader.java:
##########
@@ -0,0 +1,167 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.sql.calcite.planner;
+
+import com.google.common.collect.Iterators;
+import org.apache.calcite.config.CalciteConnectionConfig;
+import org.apache.calcite.jdbc.CalciteSchema;
+import org.apache.calcite.prepare.CalciteCatalogReader;
+import org.apache.calcite.rel.type.RelDataTypeFactory;
+import org.apache.calcite.schema.Function;
+import org.apache.calcite.sql.SqlCall;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlIdentifier;
+import org.apache.calcite.sql.SqlOperator;
+import org.apache.calcite.sql.SqlSyntax;
+import org.apache.calcite.sql.validate.SqlNameMatcher;
+import org.apache.calcite.sql.validate.SqlValidatorUtil;
+import org.apache.druid.catalog.model.TableId;
+import org.apache.druid.java.util.common.ISE;
+import org.apache.druid.server.security.ResourceAction;
+import org.apache.druid.sql.calcite.external.DruidTableMacro;
+import org.apache.druid.sql.calcite.external.DruidUserDefinedTableMacro;
+import org.apache.druid.sql.calcite.external.Externals;
+
+import java.util.Collection;
+import java.util.Collections;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * Druid-specific catalog reader which provides the special processing needed for
+ * external table macros which must be wrapped in a Druid-specific operator class.
+ */
+public class DruidCatalogReader extends CalciteCatalogReader
+{
+  public DruidCatalogReader(
+      final CalciteSchema rootSchema,
+      final List<String> defaultSchema,
+      final RelDataTypeFactory typeFactory,
+      final CalciteConnectionConfig config
+  )
+  {
+    super(rootSchema, defaultSchema, typeFactory, config);
+    // TODO Auto-generated constructor stub

Review Comment:
   What's TODO about the auto-generated constructor stub?  :trollface: 



##########
server/src/main/java/org/apache/druid/catalog/model/table/DatasourceDefn.java:
##########
@@ -149,7 +141,15 @@ protected void validateColumn(ColumnSpec spec)
   {
     super.validateColumn(spec);
     if (Columns.isTimeColumn(spec.name()) && spec.sqlType() != null) {
-      // Validate type in next PR
+      ParsedType type = TypeParser.parse(spec.sqlType());
+      if (type.kind() != ParsedType.Kind.TIME) {
+        throw new IAE(StringUtils.format(
+            "%s column must have no SQL type or SQL type %s",
+            Columns.TIME_COLUMN,
+            Columns.TIMESTAMP
+            )
+        );

Review Comment:
   You don't need `StringUtils.format` when building an IAE, it does the formatting for you.
   
   Your message is not telling me what type it thinks it got.  All I know from this message is that the time column must be a timestamp, I don't know anything about what the thing that I submitted is actually interpreted.  Please interpolate in the resolved `type`.



##########
server/src/main/java/org/apache/druid/catalog/model/TypeParser.java:
##########
@@ -0,0 +1,297 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.catalog.model;
+
+import org.apache.druid.catalog.model.MeasureTypes.MeasureType;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.java.util.common.StringUtils;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Parser for the SQL type format used in the catalog.
+ * <ul>
+ * <li>{@code null} - A null type means "use whatever Druid wants."</li>
+ * <li>Scalar - {@code VARCHAR}, {@code BIGINT}, etc.</li>
+ * <li>{@code TIMESTAMP('<grain'>} - truncated timestamp</li>
+ * <li>{@code fn(<type>)} - aggregate measure such as @{code COUNT},
+ *     {@code COUNT()} or {@code SUM(BIGINT)}.</li>
+ * </ul>
+ */
+public class TypeParser
+{
+  public static final ParsedType ANY_TYPE = new ParsedType(null, ParsedType.Kind.ANY, null, null);
+  public static final ParsedType TIME_TYPE = new ParsedType(null, ParsedType.Kind.TIME, null, null);
+
+  public static class ParsedType
+  {
+    public enum Kind
+    {
+      ANY,
+      TIME,
+      DIMENSION,
+      MEASURE
+    }
+
+    private final String type;
+    private final Kind kind;
+    private final String timeGrain;
+    private final MeasureType measureType;
+
+    public ParsedType(String type, Kind kind, String timeGrain, MeasureType measureType)
+    {
+      this.type = type;
+      this.kind = kind;
+      this.timeGrain = timeGrain;
+      this.measureType = measureType;
+    }
+
+    public String type()
+    {
+      return type;
+    }
+
+    public Kind kind()
+    {
+      return kind;
+    }
+
+    public String timeGrain()
+    {
+      return timeGrain;
+    }
+
+    public MeasureType measure()
+    {
+      return measureType;
+    }
+
+    @Override
+    public String toString()
+    {
+      StringBuilder buf = new StringBuilder("ParsedType{type=")
+          .append(type)
+          .append(", kind=")
+          .append(kind.name());
+      if (timeGrain != null) {
+        buf.append(", time grain=").append(timeGrain);
+      }
+      if (measureType != null) {
+        buf.append(", measure type=").append(measureType);
+      }
+      return buf.append("}").toString();
+    }
+  }
+
+  private static class Token
+  {
+    private enum Kind
+    {
+      SYMBOL,
+      OPEN,
+      STRING,
+      COMMA,
+      CLOSE
+    }
+
+    private final Kind kind;
+    private final String value;
+
+    public Token(Kind kind, String value)
+    {
+      this.kind = kind;
+      this.value = value;
+    }
+  }
+
+  private String input;
+  private int posn;
+
+  private TypeParser(String type)
+  {
+    this.input = type;
+  }
+
+  public static ParsedType parse(String type)
+  {
+    if (type == null) {
+      return ANY_TYPE;
+    }
+    return new TypeParser(type).parse();
+  }
+
+  private ParsedType parse()
+  {
+    Token token = parseToken();
+    if (token == null) {
+      return ANY_TYPE;
+    }
+    if (token.kind != Token.Kind.SYMBOL) {
+      throw new IAE("Invalid type name");
+    }
+    final String baseName = StringUtils.toUpperCase(token.value);
+    boolean isTime = Columns.isTimestamp(baseName);
+    token = parseToken();
+    if (token == null) {
+      if (isTime) {
+        return new ParsedType(baseName, ParsedType.Kind.TIME, null, null);
+      } else if (Columns.isScalar(baseName)) {
+        return new ParsedType(baseName, ParsedType.Kind.DIMENSION, null, null);
+      }
+      return analyzeAggregate(baseName, Collections.emptyList());
+    }
+    if (token.kind != Token.Kind.OPEN) {
+      throw new IAE("Invalid type name");
+    }
+    if (!isTime && Columns.isScalar(baseName)) {
+      throw new IAE("Invalid type name");
+    }
+    List<Token> args = new ArrayList<>();
+    token = parseToken();
+    if (token == null) {
+      throw new IAE("Invalid type name");
+    }
+    if (token.kind != Token.Kind.CLOSE) {
+      if (token.kind != Token.Kind.SYMBOL && token.kind != Token.Kind.STRING) {
+        throw new IAE("Invalid type name");
+      }
+      args.add(token);
+      while (true) {
+        token = parseToken();
+        if (token == null) {
+          throw new IAE("Invalid type name");
+        }
+        if (token.kind == Token.Kind.CLOSE) {
+          break;
+        }
+        if (token.kind != Token.Kind.COMMA) {
+          throw new IAE("Invalid type name");
+        }
+        token = parseToken();
+        if (token.kind != Token.Kind.SYMBOL && token.kind != Token.Kind.STRING) {
+          throw new IAE("Invalid type name");
+        }
+        args.add(token);
+      }
+    }
+    token = parseToken();
+    if (token != null) {
+      throw new IAE("Invalid type name");
+    }
+    if (isTime) {
+      return analyzeTimestamp(args);
+    } else {
+      return analyzeAggregate(baseName, args);
+    }
+  }
+
+  private ParsedType analyzeTimestamp(List<Token> args)
+  {
+    if (args.isEmpty()) {
+      // Odd: TIMESTAMP(), but OK...

Review Comment:
   Is this comment saying 
   
   > This case means that someone wrote TIMESTAMP such that analyzeTimestamp is called, but then they had zero arguments, so it must look like "TIMESTAMP()".
   
   If so, please use more words...  If I got it totally wrong, then please definitely use more words.



##########
server/src/main/java/org/apache/druid/catalog/model/TypeParser.java:
##########
@@ -0,0 +1,297 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.catalog.model;
+
+import org.apache.druid.catalog.model.MeasureTypes.MeasureType;
+import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.java.util.common.StringUtils;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Parser for the SQL type format used in the catalog.
+ * <ul>
+ * <li>{@code null} - A null type means "use whatever Druid wants."</li>
+ * <li>Scalar - {@code VARCHAR}, {@code BIGINT}, etc.</li>
+ * <li>{@code TIMESTAMP('<grain'>} - truncated timestamp</li>
+ * <li>{@code fn(<type>)} - aggregate measure such as @{code COUNT},
+ *     {@code COUNT()} or {@code SUM(BIGINT)}.</li>
+ * </ul>
+ */
+public class TypeParser
+{
+  public static final ParsedType ANY_TYPE = new ParsedType(null, ParsedType.Kind.ANY, null, null);
+  public static final ParsedType TIME_TYPE = new ParsedType(null, ParsedType.Kind.TIME, null, null);
+
+  public static class ParsedType
+  {
+    public enum Kind
+    {
+      ANY,
+      TIME,
+      DIMENSION,
+      MEASURE
+    }
+
+    private final String type;
+    private final Kind kind;
+    private final String timeGrain;
+    private final MeasureType measureType;
+
+    public ParsedType(String type, Kind kind, String timeGrain, MeasureType measureType)
+    {
+      this.type = type;
+      this.kind = kind;
+      this.timeGrain = timeGrain;
+      this.measureType = measureType;
+    }
+
+    public String type()
+    {
+      return type;
+    }
+
+    public Kind kind()
+    {
+      return kind;
+    }
+
+    public String timeGrain()
+    {
+      return timeGrain;
+    }
+
+    public MeasureType measure()
+    {
+      return measureType;
+    }
+
+    @Override
+    public String toString()
+    {
+      StringBuilder buf = new StringBuilder("ParsedType{type=")
+          .append(type)
+          .append(", kind=")
+          .append(kind.name());
+      if (timeGrain != null) {
+        buf.append(", time grain=").append(timeGrain);
+      }
+      if (measureType != null) {
+        buf.append(", measure type=").append(measureType);
+      }
+      return buf.append("}").toString();
+    }
+  }
+
+  private static class Token
+  {
+    private enum Kind
+    {
+      SYMBOL,
+      OPEN,
+      STRING,
+      COMMA,
+      CLOSE
+    }
+
+    private final Kind kind;
+    private final String value;
+
+    public Token(Kind kind, String value)
+    {
+      this.kind = kind;
+      this.value = value;
+    }
+  }
+
+  private String input;
+  private int posn;
+
+  private TypeParser(String type)
+  {
+    this.input = type;
+  }
+
+  public static ParsedType parse(String type)
+  {
+    if (type == null) {
+      return ANY_TYPE;
+    }
+    return new TypeParser(type).parse();
+  }
+
+  private ParsedType parse()
+  {
+    Token token = parseToken();
+    if (token == null) {
+      return ANY_TYPE;
+    }
+    if (token.kind != Token.Kind.SYMBOL) {
+      throw new IAE("Invalid type name");
+    }
+    final String baseName = StringUtils.toUpperCase(token.value);
+    boolean isTime = Columns.isTimestamp(baseName);
+    token = parseToken();
+    if (token == null) {
+      if (isTime) {
+        return new ParsedType(baseName, ParsedType.Kind.TIME, null, null);
+      } else if (Columns.isScalar(baseName)) {
+        return new ParsedType(baseName, ParsedType.Kind.DIMENSION, null, null);
+      }
+      return analyzeAggregate(baseName, Collections.emptyList());
+    }
+    if (token.kind != Token.Kind.OPEN) {
+      throw new IAE("Invalid type name");
+    }
+    if (!isTime && Columns.isScalar(baseName)) {
+      throw new IAE("Invalid type name");
+    }
+    List<Token> args = new ArrayList<>();
+    token = parseToken();
+    if (token == null) {
+      throw new IAE("Invalid type name");
+    }
+    if (token.kind != Token.Kind.CLOSE) {
+      if (token.kind != Token.Kind.SYMBOL && token.kind != Token.Kind.STRING) {
+        throw new IAE("Invalid type name");
+      }
+      args.add(token);
+      while (true) {
+        token = parseToken();
+        if (token == null) {
+          throw new IAE("Invalid type name");

Review Comment:
   None of these error messages tell me the type name that was invalid.  Might as well interpolate the `input` parameter.  Also, it would be especially nice to provide extra words and context for each of the throw-points to help someone trying to figure out what's going on
   
   For example, 
   
   ```
       if (!isTime && Columns.isScalar(baseName)) {
         throw new IAE("Invalid type name");
       }
   ```
   
   Seems like it could provide `baseName` and that it's not time/a scalar.  Stuff like that.



##########
processing/src/main/java/org/apache/druid/segment/column/RowSignature.java:
##########
@@ -190,6 +192,11 @@ private List<ColumnSignature> asColumnSignatures()
     return retVal;
   }
 
+  public Set<Entry<String, ColumnType>> entries()
+  {
+    return columnTypes.entrySet();
+  }
+

Review Comment:
   It appears that this method was added so that `LiveCatalogResolver.mergeDatasource` can use it to get the list of column names.  There's already a method `.getColumnNames` that can be used for this purpose.  As such, we don't need this method at all, please remove.



##########
sql/src/main/codegen/includes/common.ftl:
##########
@@ -18,59 +18,51 @@
  */
 
 // Using fully qualified name for Pair class, since Calcite also has a same class name being used in the Parser.jj
-org.apache.druid.java.util.common.Pair<Granularity, String> PartitionGranularity() :
+SqlNode PartitionGranularity() :

Review Comment:
   I think that this change makes the comment above this stale?  If so, please also clean up the comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org