You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/07/01 01:22:22 UTC

[GitHub] [incubator-pinot] npawar opened a new pull request #5642: Validate reserved keywords

npawar opened a new pull request #5642:
URL: https://github.com/apache/incubator-pinot/pull/5642


   ## Description
   Adding schema validations for
   1. reserved keywords being used as field names
   2. format and granularity string in dateTimeFieldSpec
   
   ## Release Notes
   The check for reserved keywords fails the add/update/validate of schemas containing sql keywords (e.g. timestamp, Date). As a result, **updates to existing schemas containing these keywords will fail**. Validation called on these schemas will also fail.
   A queryParam `validateFieldNames` has been provided to the add, update and validate schema calls. It is by default `true`. **For operations on schemas with invalid field names to be successful, this flag needs to be set to `false`.** 
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on pull request #5642: Add more schema validations

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on pull request #5642:
URL: https://github.com/apache/incubator-pinot/pull/5642#issuecomment-657716980


   @siddharthteotia please address this from the standpoint of migrating an existing table from using PQL to using SQL to query.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] npawar closed pull request #5642: Add more schema validations

Posted by GitBox <gi...@apache.org>.

npawar closed pull request #5642:
URL: https://github.com/apache/incubator-pinot/pull/5642


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on pull request #5642: Add more schema validations

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on pull request #5642:
URL: https://github.com/apache/incubator-pinot/pull/5642#issuecomment-657778475


   We can introduce versioning if we want to migrate rest API. For example
   `/v2/schemas/....`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia edited a comment on pull request #5642: Add more schema validations

Posted by GitBox <gi...@apache.org>.

siddharthteotia edited a comment on pull request #5642:
URL: https://github.com/apache/incubator-pinot/pull/5642#issuecomment-657709339


   > > @mayankshriv @siddharthteotia Please verify that LinkedIn use cases are not using the preserved SQL keyword as the column name
   > 
   > I believe some reserved keywords are being used. You will want to do this
   > 
   > ```
   > A queryParam validateFieldNames has been provided to the /add /update and /validate schema calls. It is by default true. For operations on schemas with invalid field names to be successful, this flag needs to be set to false.
   > ```
   
   IIUC, this will require use cases at Li to change their application code and pass this param? I have seen use cases having reserved keywords (like date, count etc) as field names. Calcite allows us to escape and that was the original plan to work around this problem when migrating to SQL. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] npawar commented on a change in pull request #5642: Add more schema validations

Posted by GitBox <gi...@apache.org>.

npawar commented on a change in pull request #5642:
URL: https://github.com/apache/incubator-pinot/pull/5642#discussion_r450457863



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeTableDataManager.java
##########
@@ -354,7 +354,7 @@ private boolean isValid(Schema schema, IndexingConfig indexingConfig) {
       }
     }
     // 2. We want to get the schema errors, if any, even if isValid is false;
-    if (!SchemaUtils.validate(schema, _logger)) {
+    if (!SchemaUtils.validate(schema, false, _logger)) {

Review comment:
       the schema used here is the one from zk metadata. The field names validation will already have been performed when the schema was added or updated. 
   If the user skipped validating reserved keywords when adding/updating schema, then we will have reserved keywords in this schema. Hence, at this point, we should not enforce further field names validations, as we don't know if those are intentional or not.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] npawar commented on a change in pull request #5642: Add more schema validations

Posted by GitBox <gi...@apache.org>.

npawar commented on a change in pull request #5642:
URL: https://github.com/apache/incubator-pinot/pull/5642#discussion_r450461863



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/util/SchemaUtils.java
##########
@@ -39,65 +48,130 @@
 
   private static final Logger LOGGER = LoggerFactory.getLogger(SchemaUtils.class);
 
+  private static final SqlAbstractParserImpl.Metadata SQL_PARSER_METADATA = SqlParser.create("",
+      SqlParser.configBuilder().setConformance(SqlConformanceEnum.BABEL).setParserFactory(SqlBabelParserImpl.FACTORY)
+          .build()).getMetadata();
+
   /**
-   * Validates that for a field spec with transform function, the source column name and destination column name are exclusive
-   * i.e. do not allow using source column name for destination column
+   * Validates the following:
+   * 1) Checks if sql reserved keywords are being used as field names. This check can be disabled, for not breaking existing schemas with invalid names
+   * 2) Checks valid transform function -
+   *   for a field spec with transform function, the source column name and destination column name are exclusive i.e. do not allow using source column name for destination column
+   *   ensure transform function string can be used to create a {@link FunctionEvaluator}
+   * 3) Checks valid timeFieldSpec - if incoming and outgoing granularity spec are different a) the names cannot be same b) cannot use SIMPLE_DATE_FORMAT for conversion
+   * 4) Checks valid dateTimeFieldSpecs - checks format and granularity string
+   * 5) Schema validations from {@link Schema#validate(Logger)}
    */
   public static boolean validate(Schema schema) {
-    return validate(schema, LOGGER);
+    return validate(schema, true, LOGGER);
   }
 
   /**
    * Validates the following:
-   * 1) for a field spec with transform function, the source column name and destination column name are exclusive
-   * i.e. do not allow using source column name for destination column
-   * 2) Basic schema validations
+   * 1) Checks if sql reserved keywords are being used as field names. This check can be disabled, for not breaking existing schemas with invalid names
+   * 2) Checks valid transform function -
+   *   for a field spec with transform function, the source column name and destination column name are exclusive i.e. do not allow using source column name for destination column
+   *   ensure transform function string can be used to create a {@link FunctionEvaluator}
+   * 3) Checks valid timeFieldSpec - if incoming and outgoing granularity spec are different a) the names cannot be same b) cannot use SIMPLE_DATE_FORMAT for conversion
+   * 4) Checks valid dateTimeFieldSpecs - checks format and granularity string
+   * 5) Schema validations from {@link Schema#validate(Logger)}
+   *
+   * @param validateFieldNames if false, does not validate field names. This is to prevent validation failing on existing schemas with invalid field names during a schema update
    */
-  public static boolean validate(Schema schema, Logger logger) {
+  public static boolean validate(Schema schema, boolean validateFieldNames, @Nullable Logger logger) {
     try {
       for (FieldSpec fieldSpec : schema.getAllFieldSpecs()) {
         if (!fieldSpec.isVirtualColumn()) {
-          String column = fieldSpec.getName();
-          String transformFunction = fieldSpec.getTransformFunction();
-          if (transformFunction != null) {
-            FunctionEvaluator functionEvaluator = FunctionEvaluatorFactory.getExpressionEvaluator(fieldSpec);
-            if (functionEvaluator != null) {
-              List<String> arguments = functionEvaluator.getArguments();
-              // output column used as input
-              if (arguments.contains(column)) {
-                logger.error("The arguments of transform function: {}, should not contain the destination column: {}",
-                    transformFunction, column);
-                return false;
-              }
+          if (validateFieldNames && !isValidFieldName(fieldSpec)) {
+            return false;
+          }
+          if (!isValidTransformFunction(fieldSpec)) {
+            return false;
+          }
+          FieldSpec.FieldType fieldType = fieldSpec.getFieldType();
+          if (fieldType.equals(FieldSpec.FieldType.DATE_TIME)) {
+            if (!isValidDateTimeFieldSpec(fieldSpec)) {
+              return false;
             }
-          } else if (fieldSpec.getFieldType().equals(FieldSpec.FieldType.TIME)) {
-            TimeFieldSpec timeFieldSpec = (TimeFieldSpec) fieldSpec;
-            TimeGranularitySpec incomingGranularitySpec = timeFieldSpec.getIncomingGranularitySpec();
-            TimeGranularitySpec outgoingGranularitySpec = timeFieldSpec.getOutgoingGranularitySpec();
-
-            if (!incomingGranularitySpec.equals(outgoingGranularitySpec)) {
-              // different incoming and outgoing spec, but same name
-              if (incomingGranularitySpec.getName().equals(outgoingGranularitySpec.getName())) {
-                logger.error("Cannot convert from incoming field spec:{} to outgoing field spec:{} if name is the same",
-                    incomingGranularitySpec, outgoingGranularitySpec);
-                return false;
-              } else {
-                if (!incomingGranularitySpec.getTimeFormat().equals(TimeGranularitySpec.TimeFormat.EPOCH.toString())
-                    || !outgoingGranularitySpec.getTimeFormat()
-                    .equals(TimeGranularitySpec.TimeFormat.EPOCH.toString())) {
-                  logger.error(
-                      "When incoming and outgoing specs are different, cannot perform time conversion for time format other than EPOCH");
-                  return false;
-                }
-              }
+          } else if (fieldType.equals(FieldSpec.FieldType.TIME)) {
+            if (!isValidTimeFieldSpec(fieldSpec)) {
+              return false;
             }
           }
         }
       }
     } catch (Exception e) {
-      logger.error("Exception in validating schema {}", schema.getSchemaName(), e);
+      LOGGER.error("Exception in validating schema {}", schema.getSchemaName(), e);
       return false;
     }
     return schema.validate(logger);
   }
+
+  /**
+   * Checks if any of the keywords which are reserved under the sql parser are used
+   */
+  private static boolean isValidFieldName(FieldSpec fieldSpec) {
+    String columnName = fieldSpec.getName();
+    if (SQL_PARSER_METADATA.isReservedWord(columnName.toUpperCase(Locale.ROOT))) {

Review comment:
       yes that was unnecessary, removed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5642: Add more schema validations

Posted by GitBox <gi...@apache.org>.

Jackie-Jiang commented on a change in pull request #5642:
URL: https://github.com/apache/incubator-pinot/pull/5642#discussion_r450426476



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/util/SchemaUtils.java
##########
@@ -39,65 +48,130 @@
 
   private static final Logger LOGGER = LoggerFactory.getLogger(SchemaUtils.class);
 
+  private static final SqlAbstractParserImpl.Metadata SQL_PARSER_METADATA = SqlParser.create("",
+      SqlParser.configBuilder().setConformance(SqlConformanceEnum.BABEL).setParserFactory(SqlBabelParserImpl.FACTORY)
+          .build()).getMetadata();
+
   /**
-   * Validates that for a field spec with transform function, the source column name and destination column name are exclusive
-   * i.e. do not allow using source column name for destination column
+   * Validates the following:
+   * 1) Checks if sql reserved keywords are being used as field names. This check can be disabled, for not breaking existing schemas with invalid names
+   * 2) Checks valid transform function -
+   *   for a field spec with transform function, the source column name and destination column name are exclusive i.e. do not allow using source column name for destination column
+   *   ensure transform function string can be used to create a {@link FunctionEvaluator}
+   * 3) Checks valid timeFieldSpec - if incoming and outgoing granularity spec are different a) the names cannot be same b) cannot use SIMPLE_DATE_FORMAT for conversion
+   * 4) Checks valid dateTimeFieldSpecs - checks format and granularity string
+   * 5) Schema validations from {@link Schema#validate(Logger)}
    */
   public static boolean validate(Schema schema) {
-    return validate(schema, LOGGER);
+    return validate(schema, true, LOGGER);
   }
 
   /**
    * Validates the following:
-   * 1) for a field spec with transform function, the source column name and destination column name are exclusive
-   * i.e. do not allow using source column name for destination column
-   * 2) Basic schema validations
+   * 1) Checks if sql reserved keywords are being used as field names. This check can be disabled, for not breaking existing schemas with invalid names
+   * 2) Checks valid transform function -
+   *   for a field spec with transform function, the source column name and destination column name are exclusive i.e. do not allow using source column name for destination column
+   *   ensure transform function string can be used to create a {@link FunctionEvaluator}
+   * 3) Checks valid timeFieldSpec - if incoming and outgoing granularity spec are different a) the names cannot be same b) cannot use SIMPLE_DATE_FORMAT for conversion
+   * 4) Checks valid dateTimeFieldSpecs - checks format and granularity string
+   * 5) Schema validations from {@link Schema#validate(Logger)}
+   *
+   * @param validateFieldNames if false, does not validate field names. This is to prevent validation failing on existing schemas with invalid field names during a schema update
    */
-  public static boolean validate(Schema schema, Logger logger) {
+  public static boolean validate(Schema schema, boolean validateFieldNames, @Nullable Logger logger) {
     try {
       for (FieldSpec fieldSpec : schema.getAllFieldSpecs()) {
         if (!fieldSpec.isVirtualColumn()) {
-          String column = fieldSpec.getName();
-          String transformFunction = fieldSpec.getTransformFunction();
-          if (transformFunction != null) {
-            FunctionEvaluator functionEvaluator = FunctionEvaluatorFactory.getExpressionEvaluator(fieldSpec);
-            if (functionEvaluator != null) {
-              List<String> arguments = functionEvaluator.getArguments();
-              // output column used as input
-              if (arguments.contains(column)) {
-                logger.error("The arguments of transform function: {}, should not contain the destination column: {}",
-                    transformFunction, column);
-                return false;
-              }
+          if (validateFieldNames && !isValidFieldName(fieldSpec)) {
+            return false;
+          }
+          if (!isValidTransformFunction(fieldSpec)) {
+            return false;
+          }
+          FieldSpec.FieldType fieldType = fieldSpec.getFieldType();
+          if (fieldType.equals(FieldSpec.FieldType.DATE_TIME)) {
+            if (!isValidDateTimeFieldSpec(fieldSpec)) {
+              return false;
             }
-          } else if (fieldSpec.getFieldType().equals(FieldSpec.FieldType.TIME)) {
-            TimeFieldSpec timeFieldSpec = (TimeFieldSpec) fieldSpec;
-            TimeGranularitySpec incomingGranularitySpec = timeFieldSpec.getIncomingGranularitySpec();
-            TimeGranularitySpec outgoingGranularitySpec = timeFieldSpec.getOutgoingGranularitySpec();
-
-            if (!incomingGranularitySpec.equals(outgoingGranularitySpec)) {
-              // different incoming and outgoing spec, but same name
-              if (incomingGranularitySpec.getName().equals(outgoingGranularitySpec.getName())) {
-                logger.error("Cannot convert from incoming field spec:{} to outgoing field spec:{} if name is the same",
-                    incomingGranularitySpec, outgoingGranularitySpec);
-                return false;
-              } else {
-                if (!incomingGranularitySpec.getTimeFormat().equals(TimeGranularitySpec.TimeFormat.EPOCH.toString())
-                    || !outgoingGranularitySpec.getTimeFormat()
-                    .equals(TimeGranularitySpec.TimeFormat.EPOCH.toString())) {
-                  logger.error(
-                      "When incoming and outgoing specs are different, cannot perform time conversion for time format other than EPOCH");
-                  return false;
-                }
-              }
+          } else if (fieldType.equals(FieldSpec.FieldType.TIME)) {
+            if (!isValidTimeFieldSpec(fieldSpec)) {
+              return false;
             }
           }
         }
       }
     } catch (Exception e) {
-      logger.error("Exception in validating schema {}", schema.getSchemaName(), e);
+      LOGGER.error("Exception in validating schema {}", schema.getSchemaName(), e);
       return false;
     }
     return schema.validate(logger);
   }
+
+  /**
+   * Checks if any of the keywords which are reserved under the sql parser are used
+   */
+  private static boolean isValidFieldName(FieldSpec fieldSpec) {
+    String columnName = fieldSpec.getName();
+    if (SQL_PARSER_METADATA.isReservedWord(columnName.toUpperCase(Locale.ROOT))) {

Review comment:
       Why using `Locale.ROOT`? Seems we are using the default (`columnName.toUpperCase()`) everywhere else

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeTableDataManager.java
##########
@@ -354,7 +354,7 @@ private boolean isValid(Schema schema, IndexingConfig indexingConfig) {
       }
     }
     // 2. We want to get the schema errors, if any, even if isValid is false;
-    if (!SchemaUtils.validate(schema, _logger)) {
+    if (!SchemaUtils.validate(schema, false, _logger)) {

Review comment:
       Why skipping the fields validation?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/util/SchemaUtils.java
##########
@@ -39,65 +48,130 @@
 
   private static final Logger LOGGER = LoggerFactory.getLogger(SchemaUtils.class);
 
+  private static final SqlAbstractParserImpl.Metadata SQL_PARSER_METADATA = SqlParser.create("",
+      SqlParser.configBuilder().setConformance(SqlConformanceEnum.BABEL).setParserFactory(SqlBabelParserImpl.FACTORY)
+          .build()).getMetadata();
+
   /**
-   * Validates that for a field spec with transform function, the source column name and destination column name are exclusive
-   * i.e. do not allow using source column name for destination column
+   * Validates the following:
+   * 1) Checks if sql reserved keywords are being used as field names. This check can be disabled, for not breaking existing schemas with invalid names
+   * 2) Checks valid transform function -
+   *   for a field spec with transform function, the source column name and destination column name are exclusive i.e. do not allow using source column name for destination column
+   *   ensure transform function string can be used to create a {@link FunctionEvaluator}
+   * 3) Checks valid timeFieldSpec - if incoming and outgoing granularity spec are different a) the names cannot be same b) cannot use SIMPLE_DATE_FORMAT for conversion
+   * 4) Checks valid dateTimeFieldSpecs - checks format and granularity string
+   * 5) Schema validations from {@link Schema#validate(Logger)}
    */
   public static boolean validate(Schema schema) {
-    return validate(schema, LOGGER);
+    return validate(schema, true, LOGGER);
   }
 
   /**
    * Validates the following:
-   * 1) for a field spec with transform function, the source column name and destination column name are exclusive
-   * i.e. do not allow using source column name for destination column
-   * 2) Basic schema validations
+   * 1) Checks if sql reserved keywords are being used as field names. This check can be disabled, for not breaking existing schemas with invalid names
+   * 2) Checks valid transform function -
+   *   for a field spec with transform function, the source column name and destination column name are exclusive i.e. do not allow using source column name for destination column
+   *   ensure transform function string can be used to create a {@link FunctionEvaluator}
+   * 3) Checks valid timeFieldSpec - if incoming and outgoing granularity spec are different a) the names cannot be same b) cannot use SIMPLE_DATE_FORMAT for conversion
+   * 4) Checks valid dateTimeFieldSpecs - checks format and granularity string
+   * 5) Schema validations from {@link Schema#validate(Logger)}
+   *
+   * @param validateFieldNames if false, does not validate field names. This is to prevent validation failing on existing schemas with invalid field names during a schema update
    */
-  public static boolean validate(Schema schema, Logger logger) {
+  public static boolean validate(Schema schema, boolean validateFieldNames, @Nullable Logger logger) {
     try {
       for (FieldSpec fieldSpec : schema.getAllFieldSpecs()) {
         if (!fieldSpec.isVirtualColumn()) {
-          String column = fieldSpec.getName();
-          String transformFunction = fieldSpec.getTransformFunction();
-          if (transformFunction != null) {
-            FunctionEvaluator functionEvaluator = FunctionEvaluatorFactory.getExpressionEvaluator(fieldSpec);
-            if (functionEvaluator != null) {
-              List<String> arguments = functionEvaluator.getArguments();
-              // output column used as input
-              if (arguments.contains(column)) {
-                logger.error("The arguments of transform function: {}, should not contain the destination column: {}",
-                    transformFunction, column);
-                return false;
-              }
+          if (validateFieldNames && !isValidFieldName(fieldSpec)) {
+            return false;
+          }
+          if (!isValidTransformFunction(fieldSpec)) {
+            return false;
+          }
+          FieldSpec.FieldType fieldType = fieldSpec.getFieldType();
+          if (fieldType.equals(FieldSpec.FieldType.DATE_TIME)) {
+            if (!isValidDateTimeFieldSpec(fieldSpec)) {
+              return false;
             }
-          } else if (fieldSpec.getFieldType().equals(FieldSpec.FieldType.TIME)) {
-            TimeFieldSpec timeFieldSpec = (TimeFieldSpec) fieldSpec;
-            TimeGranularitySpec incomingGranularitySpec = timeFieldSpec.getIncomingGranularitySpec();
-            TimeGranularitySpec outgoingGranularitySpec = timeFieldSpec.getOutgoingGranularitySpec();
-
-            if (!incomingGranularitySpec.equals(outgoingGranularitySpec)) {
-              // different incoming and outgoing spec, but same name
-              if (incomingGranularitySpec.getName().equals(outgoingGranularitySpec.getName())) {
-                logger.error("Cannot convert from incoming field spec:{} to outgoing field spec:{} if name is the same",
-                    incomingGranularitySpec, outgoingGranularitySpec);
-                return false;
-              } else {
-                if (!incomingGranularitySpec.getTimeFormat().equals(TimeGranularitySpec.TimeFormat.EPOCH.toString())
-                    || !outgoingGranularitySpec.getTimeFormat()
-                    .equals(TimeGranularitySpec.TimeFormat.EPOCH.toString())) {
-                  logger.error(
-                      "When incoming and outgoing specs are different, cannot perform time conversion for time format other than EPOCH");
-                  return false;
-                }
-              }
+          } else if (fieldType.equals(FieldSpec.FieldType.TIME)) {
+            if (!isValidTimeFieldSpec(fieldSpec)) {
+              return false;
             }
           }
         }
       }
     } catch (Exception e) {
-      logger.error("Exception in validating schema {}", schema.getSchemaName(), e);
+      LOGGER.error("Exception in validating schema {}", schema.getSchemaName(), e);
       return false;
     }
     return schema.validate(logger);
   }
+
+  /**
+   * Checks if any of the keywords which are reserved under the sql parser are used
+   */
+  private static boolean isValidFieldName(FieldSpec fieldSpec) {
+    String columnName = fieldSpec.getName();
+    if (SQL_PARSER_METADATA.isReservedWord(columnName.toUpperCase(Locale.ROOT))) {
+      LOGGER.error("Cannot use SQL reserved word {} as field name in the schema", columnName);
+      return false;
+    }
+    return true;
+  }
+
+  /**
+   * Checks for valid transform function string
+   */
+  private static boolean isValidTransformFunction(FieldSpec fieldSpec) {
+    String column = fieldSpec.getName();
+    String transformFunction = fieldSpec.getTransformFunction();
+    if (transformFunction != null) {
+      FunctionEvaluator functionEvaluator = FunctionEvaluatorFactory.getExpressionEvaluator(fieldSpec);

Review comment:
       You may want try-catch around this to prevent throwing exception for the validation

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/util/SchemaUtils.java
##########
@@ -39,65 +48,130 @@
 
   private static final Logger LOGGER = LoggerFactory.getLogger(SchemaUtils.class);
 
+  private static final SqlAbstractParserImpl.Metadata SQL_PARSER_METADATA = SqlParser.create("",
+      SqlParser.configBuilder().setConformance(SqlConformanceEnum.BABEL).setParserFactory(SqlBabelParserImpl.FACTORY)
+          .build()).getMetadata();
+
   /**
-   * Validates that for a field spec with transform function, the source column name and destination column name are exclusive
-   * i.e. do not allow using source column name for destination column
+   * Validates the following:
+   * 1) Checks if sql reserved keywords are being used as field names. This check can be disabled, for not breaking existing schemas with invalid names
+   * 2) Checks valid transform function -
+   *   for a field spec with transform function, the source column name and destination column name are exclusive i.e. do not allow using source column name for destination column
+   *   ensure transform function string can be used to create a {@link FunctionEvaluator}
+   * 3) Checks valid timeFieldSpec - if incoming and outgoing granularity spec are different a) the names cannot be same b) cannot use SIMPLE_DATE_FORMAT for conversion
+   * 4) Checks valid dateTimeFieldSpecs - checks format and granularity string
+   * 5) Schema validations from {@link Schema#validate(Logger)}
    */
   public static boolean validate(Schema schema) {
-    return validate(schema, LOGGER);
+    return validate(schema, true, LOGGER);
   }
 
   /**
    * Validates the following:
-   * 1) for a field spec with transform function, the source column name and destination column name are exclusive
-   * i.e. do not allow using source column name for destination column
-   * 2) Basic schema validations
+   * 1) Checks if sql reserved keywords are being used as field names. This check can be disabled, for not breaking existing schemas with invalid names
+   * 2) Checks valid transform function -
+   *   for a field spec with transform function, the source column name and destination column name are exclusive i.e. do not allow using source column name for destination column
+   *   ensure transform function string can be used to create a {@link FunctionEvaluator}
+   * 3) Checks valid timeFieldSpec - if incoming and outgoing granularity spec are different a) the names cannot be same b) cannot use SIMPLE_DATE_FORMAT for conversion
+   * 4) Checks valid dateTimeFieldSpecs - checks format and granularity string
+   * 5) Schema validations from {@link Schema#validate(Logger)}
+   *
+   * @param validateFieldNames if false, does not validate field names. This is to prevent validation failing on existing schemas with invalid field names during a schema update
    */
-  public static boolean validate(Schema schema, Logger logger) {
+  public static boolean validate(Schema schema, boolean validateFieldNames, @Nullable Logger logger) {
     try {
       for (FieldSpec fieldSpec : schema.getAllFieldSpecs()) {
         if (!fieldSpec.isVirtualColumn()) {
-          String column = fieldSpec.getName();
-          String transformFunction = fieldSpec.getTransformFunction();
-          if (transformFunction != null) {
-            FunctionEvaluator functionEvaluator = FunctionEvaluatorFactory.getExpressionEvaluator(fieldSpec);
-            if (functionEvaluator != null) {
-              List<String> arguments = functionEvaluator.getArguments();
-              // output column used as input
-              if (arguments.contains(column)) {
-                logger.error("The arguments of transform function: {}, should not contain the destination column: {}",
-                    transformFunction, column);
-                return false;
-              }
+          if (validateFieldNames && !isValidFieldName(fieldSpec)) {
+            return false;
+          }
+          if (!isValidTransformFunction(fieldSpec)) {
+            return false;
+          }
+          FieldSpec.FieldType fieldType = fieldSpec.getFieldType();
+          if (fieldType.equals(FieldSpec.FieldType.DATE_TIME)) {
+            if (!isValidDateTimeFieldSpec(fieldSpec)) {
+              return false;
             }
-          } else if (fieldSpec.getFieldType().equals(FieldSpec.FieldType.TIME)) {
-            TimeFieldSpec timeFieldSpec = (TimeFieldSpec) fieldSpec;
-            TimeGranularitySpec incomingGranularitySpec = timeFieldSpec.getIncomingGranularitySpec();
-            TimeGranularitySpec outgoingGranularitySpec = timeFieldSpec.getOutgoingGranularitySpec();
-
-            if (!incomingGranularitySpec.equals(outgoingGranularitySpec)) {
-              // different incoming and outgoing spec, but same name
-              if (incomingGranularitySpec.getName().equals(outgoingGranularitySpec.getName())) {
-                logger.error("Cannot convert from incoming field spec:{} to outgoing field spec:{} if name is the same",
-                    incomingGranularitySpec, outgoingGranularitySpec);
-                return false;
-              } else {
-                if (!incomingGranularitySpec.getTimeFormat().equals(TimeGranularitySpec.TimeFormat.EPOCH.toString())
-                    || !outgoingGranularitySpec.getTimeFormat()
-                    .equals(TimeGranularitySpec.TimeFormat.EPOCH.toString())) {
-                  logger.error(
-                      "When incoming and outgoing specs are different, cannot perform time conversion for time format other than EPOCH");
-                  return false;
-                }
-              }
+          } else if (fieldType.equals(FieldSpec.FieldType.TIME)) {
+            if (!isValidTimeFieldSpec(fieldSpec)) {
+              return false;
             }
           }
         }
       }
     } catch (Exception e) {
-      logger.error("Exception in validating schema {}", schema.getSchemaName(), e);
+      LOGGER.error("Exception in validating schema {}", schema.getSchemaName(), e);
       return false;
     }
     return schema.validate(logger);
   }
+
+  /**
+   * Checks if any of the keywords which are reserved under the sql parser are used
+   */
+  private static boolean isValidFieldName(FieldSpec fieldSpec) {
+    String columnName = fieldSpec.getName();
+    if (SQL_PARSER_METADATA.isReservedWord(columnName.toUpperCase(Locale.ROOT))) {
+      LOGGER.error("Cannot use SQL reserved word {} as field name in the schema", columnName);
+      return false;
+    }
+    return true;
+  }
+
+  /**
+   * Checks for valid transform function string
+   */
+  private static boolean isValidTransformFunction(FieldSpec fieldSpec) {
+    String column = fieldSpec.getName();
+    String transformFunction = fieldSpec.getTransformFunction();
+    if (transformFunction != null) {
+      FunctionEvaluator functionEvaluator = FunctionEvaluatorFactory.getExpressionEvaluator(fieldSpec);
+      if (functionEvaluator != null) {

Review comment:
       `functionEvaluator` should not be `null`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] npawar commented on pull request #5642: Add more schema validations

Posted by GitBox <gi...@apache.org>.

npawar commented on pull request #5642:
URL: https://github.com/apache/incubator-pinot/pull/5642#issuecomment-657733441


   > > > @mayankshriv @siddharthteotia Please verify that LinkedIn use cases are not using the preserved SQL keyword as the column name
   > > 
   > > 
   > > I believe some reserved keywords are being used. You will want to do this
   > > ```
   > > A queryParam validateFieldNames has been provided to the /add /update and /validate schema calls. It is by default true. For operations on schemas with invalid field names to be successful, this flag needs to be set to false.
   > > ```
   > 
   > IIUC, this will require use cases at Li to change their application code and pass this param? I have seen use cases having reserved keywords (like date, count etc) as field names. Calcite allows us to escape and that was the original plan to work around this problem when migrating to SQL.
   
   Yes, calls to add schemas with any of these keywords or update/validate existing schemas with keywords, will have to be changed to pass the param. This might not mean that every application has to change their code. It could be handled in any of the blocks between the controller and the application (wrapper, onboarding platform).
   
   We could look into making this a cluster level config - do you think that would be easier?
   
   It is true that Calcite allows escaping. However we're seeing more and more users on the community use these keywords as columns and then spend time debugging/wondering why some queries are failing. Plus, with so many external integrations like SuperSet, ThirdEye, it is not easy to control the escaping of the keywords. 
   
   Another alternative is to parse the query on broker, identify keywords, and escape them,  before feeding to calcite parser. But then, we'd be introducing expensive regex, and it will be cumbersome to get completely exhaustive.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on pull request #5642: Add more schema validations

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on pull request #5642:
URL: https://github.com/apache/incubator-pinot/pull/5642#issuecomment-657709339


   > > @mayankshriv @siddharthteotia Please verify that LinkedIn use cases are not using the preserved SQL keyword as the column name
   > 
   > I believe some reserved keywords are being used. You will want to do this
   > 
   > ```
   > A queryParam validateFieldNames has been provided to the /add /update and /validate schema calls. It is by default true. For operations on schemas with invalid field names to be successful, this flag needs to be set to false.
   > ```
   
   IIUC, this will require use cases at Li to change their application code and pass this param. I have seen use cases having reserved keywords (like date, count etc) as field names. Calcite allows us to escape and that was the original plan to work around this problem when migrating to SQL. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] npawar commented on pull request #5642: Add more schema validations

Posted by GitBox <gi...@apache.org>.

npawar commented on pull request #5642:
URL: https://github.com/apache/incubator-pinot/pull/5642#issuecomment-658486993


   closing this. https://github.com/apache/incubator-pinot/pull/5706 has the dateTimeFieldSpec related changes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] npawar commented on a change in pull request #5642: Add more schema validations

Posted by GitBox <gi...@apache.org>.

npawar commented on a change in pull request #5642:
URL: https://github.com/apache/incubator-pinot/pull/5642#discussion_r450461941



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/util/SchemaUtils.java
##########
@@ -39,65 +48,130 @@
 
   private static final Logger LOGGER = LoggerFactory.getLogger(SchemaUtils.class);
 
+  private static final SqlAbstractParserImpl.Metadata SQL_PARSER_METADATA = SqlParser.create("",
+      SqlParser.configBuilder().setConformance(SqlConformanceEnum.BABEL).setParserFactory(SqlBabelParserImpl.FACTORY)
+          .build()).getMetadata();
+
   /**
-   * Validates that for a field spec with transform function, the source column name and destination column name are exclusive
-   * i.e. do not allow using source column name for destination column
+   * Validates the following:
+   * 1) Checks if sql reserved keywords are being used as field names. This check can be disabled, for not breaking existing schemas with invalid names
+   * 2) Checks valid transform function -
+   *   for a field spec with transform function, the source column name and destination column name are exclusive i.e. do not allow using source column name for destination column
+   *   ensure transform function string can be used to create a {@link FunctionEvaluator}
+   * 3) Checks valid timeFieldSpec - if incoming and outgoing granularity spec are different a) the names cannot be same b) cannot use SIMPLE_DATE_FORMAT for conversion
+   * 4) Checks valid dateTimeFieldSpecs - checks format and granularity string
+   * 5) Schema validations from {@link Schema#validate(Logger)}
    */
   public static boolean validate(Schema schema) {
-    return validate(schema, LOGGER);
+    return validate(schema, true, LOGGER);
   }
 
   /**
    * Validates the following:
-   * 1) for a field spec with transform function, the source column name and destination column name are exclusive
-   * i.e. do not allow using source column name for destination column
-   * 2) Basic schema validations
+   * 1) Checks if sql reserved keywords are being used as field names. This check can be disabled, for not breaking existing schemas with invalid names
+   * 2) Checks valid transform function -
+   *   for a field spec with transform function, the source column name and destination column name are exclusive i.e. do not allow using source column name for destination column
+   *   ensure transform function string can be used to create a {@link FunctionEvaluator}
+   * 3) Checks valid timeFieldSpec - if incoming and outgoing granularity spec are different a) the names cannot be same b) cannot use SIMPLE_DATE_FORMAT for conversion
+   * 4) Checks valid dateTimeFieldSpecs - checks format and granularity string
+   * 5) Schema validations from {@link Schema#validate(Logger)}
+   *
+   * @param validateFieldNames if false, does not validate field names. This is to prevent validation failing on existing schemas with invalid field names during a schema update
    */
-  public static boolean validate(Schema schema, Logger logger) {
+  public static boolean validate(Schema schema, boolean validateFieldNames, @Nullable Logger logger) {
     try {
       for (FieldSpec fieldSpec : schema.getAllFieldSpecs()) {
         if (!fieldSpec.isVirtualColumn()) {
-          String column = fieldSpec.getName();
-          String transformFunction = fieldSpec.getTransformFunction();
-          if (transformFunction != null) {
-            FunctionEvaluator functionEvaluator = FunctionEvaluatorFactory.getExpressionEvaluator(fieldSpec);
-            if (functionEvaluator != null) {
-              List<String> arguments = functionEvaluator.getArguments();
-              // output column used as input
-              if (arguments.contains(column)) {
-                logger.error("The arguments of transform function: {}, should not contain the destination column: {}",
-                    transformFunction, column);
-                return false;
-              }
+          if (validateFieldNames && !isValidFieldName(fieldSpec)) {
+            return false;
+          }
+          if (!isValidTransformFunction(fieldSpec)) {
+            return false;
+          }
+          FieldSpec.FieldType fieldType = fieldSpec.getFieldType();
+          if (fieldType.equals(FieldSpec.FieldType.DATE_TIME)) {
+            if (!isValidDateTimeFieldSpec(fieldSpec)) {
+              return false;
             }
-          } else if (fieldSpec.getFieldType().equals(FieldSpec.FieldType.TIME)) {
-            TimeFieldSpec timeFieldSpec = (TimeFieldSpec) fieldSpec;
-            TimeGranularitySpec incomingGranularitySpec = timeFieldSpec.getIncomingGranularitySpec();
-            TimeGranularitySpec outgoingGranularitySpec = timeFieldSpec.getOutgoingGranularitySpec();
-
-            if (!incomingGranularitySpec.equals(outgoingGranularitySpec)) {
-              // different incoming and outgoing spec, but same name
-              if (incomingGranularitySpec.getName().equals(outgoingGranularitySpec.getName())) {
-                logger.error("Cannot convert from incoming field spec:{} to outgoing field spec:{} if name is the same",
-                    incomingGranularitySpec, outgoingGranularitySpec);
-                return false;
-              } else {
-                if (!incomingGranularitySpec.getTimeFormat().equals(TimeGranularitySpec.TimeFormat.EPOCH.toString())
-                    || !outgoingGranularitySpec.getTimeFormat()
-                    .equals(TimeGranularitySpec.TimeFormat.EPOCH.toString())) {
-                  logger.error(
-                      "When incoming and outgoing specs are different, cannot perform time conversion for time format other than EPOCH");
-                  return false;
-                }
-              }
+          } else if (fieldType.equals(FieldSpec.FieldType.TIME)) {
+            if (!isValidTimeFieldSpec(fieldSpec)) {
+              return false;
             }
           }
         }
       }
     } catch (Exception e) {
-      logger.error("Exception in validating schema {}", schema.getSchemaName(), e);
+      LOGGER.error("Exception in validating schema {}", schema.getSchemaName(), e);
       return false;
     }
     return schema.validate(logger);
   }
+
+  /**
+   * Checks if any of the keywords which are reserved under the sql parser are used
+   */
+  private static boolean isValidFieldName(FieldSpec fieldSpec) {
+    String columnName = fieldSpec.getName();
+    if (SQL_PARSER_METADATA.isReservedWord(columnName.toUpperCase(Locale.ROOT))) {
+      LOGGER.error("Cannot use SQL reserved word {} as field name in the schema", columnName);
+      return false;
+    }
+    return true;
+  }
+
+  /**
+   * Checks for valid transform function string
+   */
+  private static boolean isValidTransformFunction(FieldSpec fieldSpec) {
+    String column = fieldSpec.getName();
+    String transformFunction = fieldSpec.getTransformFunction();
+    if (transformFunction != null) {
+      FunctionEvaluator functionEvaluator = FunctionEvaluatorFactory.getExpressionEvaluator(fieldSpec);
+      if (functionEvaluator != null) {

Review comment:
       right, removed null check

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/util/SchemaUtils.java
##########
@@ -39,65 +48,130 @@
 
   private static final Logger LOGGER = LoggerFactory.getLogger(SchemaUtils.class);
 
+  private static final SqlAbstractParserImpl.Metadata SQL_PARSER_METADATA = SqlParser.create("",
+      SqlParser.configBuilder().setConformance(SqlConformanceEnum.BABEL).setParserFactory(SqlBabelParserImpl.FACTORY)
+          .build()).getMetadata();
+
   /**
-   * Validates that for a field spec with transform function, the source column name and destination column name are exclusive
-   * i.e. do not allow using source column name for destination column
+   * Validates the following:
+   * 1) Checks if sql reserved keywords are being used as field names. This check can be disabled, for not breaking existing schemas with invalid names
+   * 2) Checks valid transform function -
+   *   for a field spec with transform function, the source column name and destination column name are exclusive i.e. do not allow using source column name for destination column
+   *   ensure transform function string can be used to create a {@link FunctionEvaluator}
+   * 3) Checks valid timeFieldSpec - if incoming and outgoing granularity spec are different a) the names cannot be same b) cannot use SIMPLE_DATE_FORMAT for conversion
+   * 4) Checks valid dateTimeFieldSpecs - checks format and granularity string
+   * 5) Schema validations from {@link Schema#validate(Logger)}
    */
   public static boolean validate(Schema schema) {
-    return validate(schema, LOGGER);
+    return validate(schema, true, LOGGER);
   }
 
   /**
    * Validates the following:
-   * 1) for a field spec with transform function, the source column name and destination column name are exclusive
-   * i.e. do not allow using source column name for destination column
-   * 2) Basic schema validations
+   * 1) Checks if sql reserved keywords are being used as field names. This check can be disabled, for not breaking existing schemas with invalid names
+   * 2) Checks valid transform function -
+   *   for a field spec with transform function, the source column name and destination column name are exclusive i.e. do not allow using source column name for destination column
+   *   ensure transform function string can be used to create a {@link FunctionEvaluator}
+   * 3) Checks valid timeFieldSpec - if incoming and outgoing granularity spec are different a) the names cannot be same b) cannot use SIMPLE_DATE_FORMAT for conversion
+   * 4) Checks valid dateTimeFieldSpecs - checks format and granularity string
+   * 5) Schema validations from {@link Schema#validate(Logger)}
+   *
+   * @param validateFieldNames if false, does not validate field names. This is to prevent validation failing on existing schemas with invalid field names during a schema update
    */
-  public static boolean validate(Schema schema, Logger logger) {
+  public static boolean validate(Schema schema, boolean validateFieldNames, @Nullable Logger logger) {
     try {
       for (FieldSpec fieldSpec : schema.getAllFieldSpecs()) {
         if (!fieldSpec.isVirtualColumn()) {
-          String column = fieldSpec.getName();
-          String transformFunction = fieldSpec.getTransformFunction();
-          if (transformFunction != null) {
-            FunctionEvaluator functionEvaluator = FunctionEvaluatorFactory.getExpressionEvaluator(fieldSpec);
-            if (functionEvaluator != null) {
-              List<String> arguments = functionEvaluator.getArguments();
-              // output column used as input
-              if (arguments.contains(column)) {
-                logger.error("The arguments of transform function: {}, should not contain the destination column: {}",
-                    transformFunction, column);
-                return false;
-              }
+          if (validateFieldNames && !isValidFieldName(fieldSpec)) {
+            return false;
+          }
+          if (!isValidTransformFunction(fieldSpec)) {
+            return false;
+          }
+          FieldSpec.FieldType fieldType = fieldSpec.getFieldType();
+          if (fieldType.equals(FieldSpec.FieldType.DATE_TIME)) {
+            if (!isValidDateTimeFieldSpec(fieldSpec)) {
+              return false;
             }
-          } else if (fieldSpec.getFieldType().equals(FieldSpec.FieldType.TIME)) {
-            TimeFieldSpec timeFieldSpec = (TimeFieldSpec) fieldSpec;
-            TimeGranularitySpec incomingGranularitySpec = timeFieldSpec.getIncomingGranularitySpec();
-            TimeGranularitySpec outgoingGranularitySpec = timeFieldSpec.getOutgoingGranularitySpec();
-
-            if (!incomingGranularitySpec.equals(outgoingGranularitySpec)) {
-              // different incoming and outgoing spec, but same name
-              if (incomingGranularitySpec.getName().equals(outgoingGranularitySpec.getName())) {
-                logger.error("Cannot convert from incoming field spec:{} to outgoing field spec:{} if name is the same",
-                    incomingGranularitySpec, outgoingGranularitySpec);
-                return false;
-              } else {
-                if (!incomingGranularitySpec.getTimeFormat().equals(TimeGranularitySpec.TimeFormat.EPOCH.toString())
-                    || !outgoingGranularitySpec.getTimeFormat()
-                    .equals(TimeGranularitySpec.TimeFormat.EPOCH.toString())) {
-                  logger.error(
-                      "When incoming and outgoing specs are different, cannot perform time conversion for time format other than EPOCH");
-                  return false;
-                }
-              }
+          } else if (fieldType.equals(FieldSpec.FieldType.TIME)) {
+            if (!isValidTimeFieldSpec(fieldSpec)) {
+              return false;
             }
           }
         }
       }
     } catch (Exception e) {
-      logger.error("Exception in validating schema {}", schema.getSchemaName(), e);
+      LOGGER.error("Exception in validating schema {}", schema.getSchemaName(), e);
       return false;
     }
     return schema.validate(logger);
   }
+
+  /**
+   * Checks if any of the keywords which are reserved under the sql parser are used
+   */
+  private static boolean isValidFieldName(FieldSpec fieldSpec) {
+    String columnName = fieldSpec.getName();
+    if (SQL_PARSER_METADATA.isReservedWord(columnName.toUpperCase(Locale.ROOT))) {
+      LOGGER.error("Cannot use SQL reserved word {} as field name in the schema", columnName);
+      return false;
+    }
+    return true;
+  }
+
+  /**
+   * Checks for valid transform function string
+   */
+  private static boolean isValidTransformFunction(FieldSpec fieldSpec) {
+    String column = fieldSpec.getName();
+    String transformFunction = fieldSpec.getTransformFunction();
+    if (transformFunction != null) {
+      FunctionEvaluator functionEvaluator = FunctionEvaluatorFactory.getExpressionEvaluator(fieldSpec);

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on pull request #5642: Add more schema validations

Posted by GitBox <gi...@apache.org>.

siddharthteotia commented on pull request #5642:
URL: https://github.com/apache/incubator-pinot/pull/5642#issuecomment-657774046


   Sorry if this is a rest API 101 question. Is it possible to do method overloading in REST where we have two addSchema() APIs. The second one takes the newly added argument with default as true thus giving the new behavior but we still keep an old API with argument and behavior as today. I am just trying to find ways to incorporate this without having to make changes. 
   
   Note that this is not just about SQL migration. This is essentially impacting everyone today in PQL world as well since their calls to add/update schema will start failing. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on pull request #5642: Add more schema validations

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on pull request #5642:
URL: https://github.com/apache/incubator-pinot/pull/5642#issuecomment-657771823


   Can we make `validateFieldNames` as a cluster setting instead of taking it on each request?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] npawar commented on pull request #5642: Add more schema validations

Posted by GitBox <gi...@apache.org>.

npawar commented on pull request #5642:
URL: https://github.com/apache/incubator-pinot/pull/5642#issuecomment-654449955


   > @mayankshriv @siddharthteotia Please verify that LinkedIn use cases are not using the preserved SQL keyword as the column name
   
   I believe some reserved keywords are being used. You will want to do this
   ```
   A queryParam validateFieldNames has been provided to the /add /update and /validate schema calls. It is by default true. For operations on schemas with invalid field names to be successful, this flag needs to be set to false.
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #5642: Add more schema validations

Posted by GitBox <gi...@apache.org>.

mcvsubbu commented on a change in pull request #5642:
URL: https://github.com/apache/incubator-pinot/pull/5642#discussion_r453785181



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/util/SchemaUtils.java
##########
@@ -39,65 +48,134 @@
 
   private static final Logger LOGGER = LoggerFactory.getLogger(SchemaUtils.class);
 
+  private static final SqlAbstractParserImpl.Metadata SQL_PARSER_METADATA = SqlParser.create("",
+      SqlParser.configBuilder().setConformance(SqlConformanceEnum.BABEL).setParserFactory(SqlBabelParserImpl.FACTORY)
+          .build()).getMetadata();
+
   /**
-   * Validates that for a field spec with transform function, the source column name and destination column name are exclusive
-   * i.e. do not allow using source column name for destination column
+   * Validates the following:
+   * 1) Checks if sql reserved keywords are being used as field names. This check can be disabled, for not breaking existing schemas with invalid names
+   * 2) Checks valid transform function -
+   *   for a field spec with transform function, the source column name and destination column name are exclusive i.e. do not allow using source column name for destination column
+   *   ensure transform function string can be used to create a {@link FunctionEvaluator}
+   * 3) Checks valid timeFieldSpec - if incoming and outgoing granularity spec are different a) the names cannot be same b) cannot use SIMPLE_DATE_FORMAT for conversion
+   * 4) Checks valid dateTimeFieldSpecs - checks format and granularity string
+   * 5) Schema validations from {@link Schema#validate(Logger)}
    */
   public static boolean validate(Schema schema) {
-    return validate(schema, LOGGER);
+    return validate(schema, true, LOGGER);
   }
 
   /**
    * Validates the following:
-   * 1) for a field spec with transform function, the source column name and destination column name are exclusive
-   * i.e. do not allow using source column name for destination column
-   * 2) Basic schema validations
+   * 1) Checks if sql reserved keywords are being used as field names. This check can be disabled, for not breaking existing schemas with invalid names
+   * 2) Checks valid transform function -
+   *   for a field spec with transform function, the source column name and destination column name are exclusive i.e. do not allow using source column name for destination column
+   *   ensure transform function string can be used to create a {@link FunctionEvaluator}
+   * 3) Checks valid timeFieldSpec - if incoming and outgoing granularity spec are different a) the names cannot be same b) cannot use SIMPLE_DATE_FORMAT for conversion
+   * 4) Checks valid dateTimeFieldSpecs - checks format and granularity string
+   * 5) Schema validations from {@link Schema#validate(Logger)}
+   *
+   * @param validateFieldNames if false, does not validate field names. This is to prevent validation failing on existing schemas with invalid field names during a schema update
    */
-  public static boolean validate(Schema schema, Logger logger) {
+  public static boolean validate(Schema schema, boolean validateFieldNames, @Nullable Logger logger) {
     try {
       for (FieldSpec fieldSpec : schema.getAllFieldSpecs()) {
         if (!fieldSpec.isVirtualColumn()) {
-          String column = fieldSpec.getName();
-          String transformFunction = fieldSpec.getTransformFunction();
-          if (transformFunction != null) {
-            FunctionEvaluator functionEvaluator = FunctionEvaluatorFactory.getExpressionEvaluator(fieldSpec);
-            if (functionEvaluator != null) {
-              List<String> arguments = functionEvaluator.getArguments();
-              // output column used as input
-              if (arguments.contains(column)) {
-                logger.error("The arguments of transform function: {}, should not contain the destination column: {}",
-                    transformFunction, column);
-                return false;
-              }
+          if (validateFieldNames && !isValidFieldName(fieldSpec)) {
+            return false;
+          }
+          if (!isValidTransformFunction(fieldSpec)) {

Review comment:
       Pass the logger to all the isValidXXX methods so that you can log the message to the right logger.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/util/SchemaUtils.java
##########
@@ -39,65 +48,134 @@
 
   private static final Logger LOGGER = LoggerFactory.getLogger(SchemaUtils.class);
 
+  private static final SqlAbstractParserImpl.Metadata SQL_PARSER_METADATA = SqlParser.create("",

Review comment:
       I would not take a sql dependency here. I think it is cleaner to separate the query language used from Pinot schema. No matter which query language we use, there will be some reserved words.
   I would have a sql init call into schema utils init to set the reserved words here.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/util/SchemaUtils.java
##########
@@ -39,65 +48,134 @@
 
   private static final Logger LOGGER = LoggerFactory.getLogger(SchemaUtils.class);
 
+  private static final SqlAbstractParserImpl.Metadata SQL_PARSER_METADATA = SqlParser.create("",
+      SqlParser.configBuilder().setConformance(SqlConformanceEnum.BABEL).setParserFactory(SqlBabelParserImpl.FACTORY)
+          .build()).getMetadata();
+
   /**
-   * Validates that for a field spec with transform function, the source column name and destination column name are exclusive
-   * i.e. do not allow using source column name for destination column
+   * Validates the following:
+   * 1) Checks if sql reserved keywords are being used as field names. This check can be disabled, for not breaking existing schemas with invalid names
+   * 2) Checks valid transform function -
+   *   for a field spec with transform function, the source column name and destination column name are exclusive i.e. do not allow using source column name for destination column
+   *   ensure transform function string can be used to create a {@link FunctionEvaluator}
+   * 3) Checks valid timeFieldSpec - if incoming and outgoing granularity spec are different a) the names cannot be same b) cannot use SIMPLE_DATE_FORMAT for conversion
+   * 4) Checks valid dateTimeFieldSpecs - checks format and granularity string
+   * 5) Schema validations from {@link Schema#validate(Logger)}
    */
   public static boolean validate(Schema schema) {
-    return validate(schema, LOGGER);
+    return validate(schema, true, LOGGER);
   }
 
   /**
    * Validates the following:
-   * 1) for a field spec with transform function, the source column name and destination column name are exclusive
-   * i.e. do not allow using source column name for destination column
-   * 2) Basic schema validations
+   * 1) Checks if sql reserved keywords are being used as field names. This check can be disabled, for not breaking existing schemas with invalid names
+   * 2) Checks valid transform function -
+   *   for a field spec with transform function, the source column name and destination column name are exclusive i.e. do not allow using source column name for destination column
+   *   ensure transform function string can be used to create a {@link FunctionEvaluator}
+   * 3) Checks valid timeFieldSpec - if incoming and outgoing granularity spec are different a) the names cannot be same b) cannot use SIMPLE_DATE_FORMAT for conversion
+   * 4) Checks valid dateTimeFieldSpecs - checks format and granularity string
+   * 5) Schema validations from {@link Schema#validate(Logger)}
+   *
+   * @param validateFieldNames if false, does not validate field names. This is to prevent validation failing on existing schemas with invalid field names during a schema update
    */
-  public static boolean validate(Schema schema, Logger logger) {
+  public static boolean validate(Schema schema, boolean validateFieldNames, @Nullable Logger logger) {
     try {
       for (FieldSpec fieldSpec : schema.getAllFieldSpecs()) {
         if (!fieldSpec.isVirtualColumn()) {
-          String column = fieldSpec.getName();
-          String transformFunction = fieldSpec.getTransformFunction();
-          if (transformFunction != null) {
-            FunctionEvaluator functionEvaluator = FunctionEvaluatorFactory.getExpressionEvaluator(fieldSpec);
-            if (functionEvaluator != null) {
-              List<String> arguments = functionEvaluator.getArguments();
-              // output column used as input
-              if (arguments.contains(column)) {
-                logger.error("The arguments of transform function: {}, should not contain the destination column: {}",
-                    transformFunction, column);
-                return false;
-              }
+          if (validateFieldNames && !isValidFieldName(fieldSpec)) {
+            return false;
+          }
+          if (!isValidTransformFunction(fieldSpec)) {
+            return false;
+          }
+          FieldSpec.FieldType fieldType = fieldSpec.getFieldType();
+          if (fieldType.equals(FieldSpec.FieldType.DATE_TIME)) {
+            if (!isValidDateTimeFieldSpec(fieldSpec)) {
+              return false;
             }
-          } else if (fieldSpec.getFieldType().equals(FieldSpec.FieldType.TIME)) {
-            TimeFieldSpec timeFieldSpec = (TimeFieldSpec) fieldSpec;
-            TimeGranularitySpec incomingGranularitySpec = timeFieldSpec.getIncomingGranularitySpec();
-            TimeGranularitySpec outgoingGranularitySpec = timeFieldSpec.getOutgoingGranularitySpec();
-
-            if (!incomingGranularitySpec.equals(outgoingGranularitySpec)) {
-              // different incoming and outgoing spec, but same name
-              if (incomingGranularitySpec.getName().equals(outgoingGranularitySpec.getName())) {
-                logger.error("Cannot convert from incoming field spec:{} to outgoing field spec:{} if name is the same",
-                    incomingGranularitySpec, outgoingGranularitySpec);
-                return false;
-              } else {
-                if (!incomingGranularitySpec.getTimeFormat().equals(TimeGranularitySpec.TimeFormat.EPOCH.toString())
-                    || !outgoingGranularitySpec.getTimeFormat()
-                    .equals(TimeGranularitySpec.TimeFormat.EPOCH.toString())) {
-                  logger.error(
-                      "When incoming and outgoing specs are different, cannot perform time conversion for time format other than EPOCH");
-                  return false;
-                }
-              }
+          } else if (fieldType.equals(FieldSpec.FieldType.TIME)) {
+            if (!isValidTimeFieldSpec(fieldSpec)) {
+              return false;
             }
           }
         }
       }
     } catch (Exception e) {
-      logger.error("Exception in validating schema {}", schema.getSchemaName(), e);
+      LOGGER.error("Exception in validating schema {}", schema.getSchemaName(), e);

Review comment:
       ```suggestion
         logger.error("Exception in validating schema {}", schema.getSchemaName(), e);
   ```

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/util/SchemaUtils.java
##########
@@ -39,65 +48,134 @@
 
   private static final Logger LOGGER = LoggerFactory.getLogger(SchemaUtils.class);
 
+  private static final SqlAbstractParserImpl.Metadata SQL_PARSER_METADATA = SqlParser.create("",
+      SqlParser.configBuilder().setConformance(SqlConformanceEnum.BABEL).setParserFactory(SqlBabelParserImpl.FACTORY)
+          .build()).getMetadata();
+
   /**
-   * Validates that for a field spec with transform function, the source column name and destination column name are exclusive
-   * i.e. do not allow using source column name for destination column
+   * Validates the following:
+   * 1) Checks if sql reserved keywords are being used as field names. This check can be disabled, for not breaking existing schemas with invalid names
+   * 2) Checks valid transform function -
+   *   for a field spec with transform function, the source column name and destination column name are exclusive i.e. do not allow using source column name for destination column
+   *   ensure transform function string can be used to create a {@link FunctionEvaluator}
+   * 3) Checks valid timeFieldSpec - if incoming and outgoing granularity spec are different a) the names cannot be same b) cannot use SIMPLE_DATE_FORMAT for conversion
+   * 4) Checks valid dateTimeFieldSpecs - checks format and granularity string
+   * 5) Schema validations from {@link Schema#validate(Logger)}
    */
   public static boolean validate(Schema schema) {
-    return validate(schema, LOGGER);
+    return validate(schema, true, LOGGER);
   }
 
   /**
    * Validates the following:
-   * 1) for a field spec with transform function, the source column name and destination column name are exclusive
-   * i.e. do not allow using source column name for destination column
-   * 2) Basic schema validations
+   * 1) Checks if sql reserved keywords are being used as field names. This check can be disabled, for not breaking existing schemas with invalid names
+   * 2) Checks valid transform function -
+   *   for a field spec with transform function, the source column name and destination column name are exclusive i.e. do not allow using source column name for destination column
+   *   ensure transform function string can be used to create a {@link FunctionEvaluator}
+   * 3) Checks valid timeFieldSpec - if incoming and outgoing granularity spec are different a) the names cannot be same b) cannot use SIMPLE_DATE_FORMAT for conversion
+   * 4) Checks valid dateTimeFieldSpecs - checks format and granularity string
+   * 5) Schema validations from {@link Schema#validate(Logger)}
+   *
+   * @param validateFieldNames if false, does not validate field names. This is to prevent validation failing on existing schemas with invalid field names during a schema update
    */
-  public static boolean validate(Schema schema, Logger logger) {
+  public static boolean validate(Schema schema, boolean validateFieldNames, @Nullable Logger logger) {
     try {
       for (FieldSpec fieldSpec : schema.getAllFieldSpecs()) {
         if (!fieldSpec.isVirtualColumn()) {
-          String column = fieldSpec.getName();
-          String transformFunction = fieldSpec.getTransformFunction();
-          if (transformFunction != null) {
-            FunctionEvaluator functionEvaluator = FunctionEvaluatorFactory.getExpressionEvaluator(fieldSpec);
-            if (functionEvaluator != null) {
-              List<String> arguments = functionEvaluator.getArguments();
-              // output column used as input
-              if (arguments.contains(column)) {
-                logger.error("The arguments of transform function: {}, should not contain the destination column: {}",
-                    transformFunction, column);
-                return false;
-              }
+          if (validateFieldNames && !isValidFieldName(fieldSpec)) {
+            return false;
+          }
+          if (!isValidTransformFunction(fieldSpec)) {
+            return false;
+          }
+          FieldSpec.FieldType fieldType = fieldSpec.getFieldType();
+          if (fieldType.equals(FieldSpec.FieldType.DATE_TIME)) {
+            if (!isValidDateTimeFieldSpec(fieldSpec)) {
+              return false;
             }
-          } else if (fieldSpec.getFieldType().equals(FieldSpec.FieldType.TIME)) {
-            TimeFieldSpec timeFieldSpec = (TimeFieldSpec) fieldSpec;
-            TimeGranularitySpec incomingGranularitySpec = timeFieldSpec.getIncomingGranularitySpec();
-            TimeGranularitySpec outgoingGranularitySpec = timeFieldSpec.getOutgoingGranularitySpec();
-
-            if (!incomingGranularitySpec.equals(outgoingGranularitySpec)) {
-              // different incoming and outgoing spec, but same name
-              if (incomingGranularitySpec.getName().equals(outgoingGranularitySpec.getName())) {
-                logger.error("Cannot convert from incoming field spec:{} to outgoing field spec:{} if name is the same",
-                    incomingGranularitySpec, outgoingGranularitySpec);
-                return false;
-              } else {
-                if (!incomingGranularitySpec.getTimeFormat().equals(TimeGranularitySpec.TimeFormat.EPOCH.toString())
-                    || !outgoingGranularitySpec.getTimeFormat()
-                    .equals(TimeGranularitySpec.TimeFormat.EPOCH.toString())) {
-                  logger.error(
-                      "When incoming and outgoing specs are different, cannot perform time conversion for time format other than EPOCH");
-                  return false;
-                }
-              }
+          } else if (fieldType.equals(FieldSpec.FieldType.TIME)) {
+            if (!isValidTimeFieldSpec(fieldSpec)) {
+              return false;
             }
           }
         }
       }
     } catch (Exception e) {
-      logger.error("Exception in validating schema {}", schema.getSchemaName(), e);
+      LOGGER.error("Exception in validating schema {}", schema.getSchemaName(), e);
       return false;
     }
     return schema.validate(logger);
   }
+
+  /**
+   * Checks if any of the keywords which are reserved under the sql parser are used
+   */
+  private static boolean isValidFieldName(FieldSpec fieldSpec) {
+    String columnName = fieldSpec.getName();
+    if (SQL_PARSER_METADATA.isReservedWord(columnName.toUpperCase())) {
+      LOGGER.error("Cannot use SQL reserved word {} as field name in the schema", columnName);
+      return false;
+    }
+    return true;
+  }
+
+  /**
+   * Checks for valid transform function string
+   */
+  private static boolean isValidTransformFunction(FieldSpec fieldSpec) {
+    String column = fieldSpec.getName();
+    String transformFunction = fieldSpec.getTransformFunction();
+    if (transformFunction != null) {
+      try {
+        List<String> arguments = FunctionEvaluatorFactory.getExpressionEvaluator(fieldSpec).getArguments();

Review comment:
       Useful to check for circular dependency here? What if we have column A used as argument for column B and vice versa? Should we not resolve all arguments of all fields before declaring a bad schema?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/util/SchemaUtils.java
##########
@@ -39,65 +48,134 @@
 
   private static final Logger LOGGER = LoggerFactory.getLogger(SchemaUtils.class);
 
+  private static final SqlAbstractParserImpl.Metadata SQL_PARSER_METADATA = SqlParser.create("",
+      SqlParser.configBuilder().setConformance(SqlConformanceEnum.BABEL).setParserFactory(SqlBabelParserImpl.FACTORY)
+          .build()).getMetadata();
+
   /**
-   * Validates that for a field spec with transform function, the source column name and destination column name are exclusive
-   * i.e. do not allow using source column name for destination column
+   * Validates the following:
+   * 1) Checks if sql reserved keywords are being used as field names. This check can be disabled, for not breaking existing schemas with invalid names
+   * 2) Checks valid transform function -
+   *   for a field spec with transform function, the source column name and destination column name are exclusive i.e. do not allow using source column name for destination column
+   *   ensure transform function string can be used to create a {@link FunctionEvaluator}
+   * 3) Checks valid timeFieldSpec - if incoming and outgoing granularity spec are different a) the names cannot be same b) cannot use SIMPLE_DATE_FORMAT for conversion
+   * 4) Checks valid dateTimeFieldSpecs - checks format and granularity string
+   * 5) Schema validations from {@link Schema#validate(Logger)}
    */
   public static boolean validate(Schema schema) {
-    return validate(schema, LOGGER);
+    return validate(schema, true, LOGGER);
   }
 
   /**
    * Validates the following:
-   * 1) for a field spec with transform function, the source column name and destination column name are exclusive
-   * i.e. do not allow using source column name for destination column
-   * 2) Basic schema validations
+   * 1) Checks if sql reserved keywords are being used as field names. This check can be disabled, for not breaking existing schemas with invalid names
+   * 2) Checks valid transform function -
+   *   for a field spec with transform function, the source column name and destination column name are exclusive i.e. do not allow using source column name for destination column
+   *   ensure transform function string can be used to create a {@link FunctionEvaluator}
+   * 3) Checks valid timeFieldSpec - if incoming and outgoing granularity spec are different a) the names cannot be same b) cannot use SIMPLE_DATE_FORMAT for conversion
+   * 4) Checks valid dateTimeFieldSpecs - checks format and granularity string
+   * 5) Schema validations from {@link Schema#validate(Logger)}
+   *
+   * @param validateFieldNames if false, does not validate field names. This is to prevent validation failing on existing schemas with invalid field names during a schema update
    */
-  public static boolean validate(Schema schema, Logger logger) {
+  public static boolean validate(Schema schema, boolean validateFieldNames, @Nullable Logger logger) {
     try {
       for (FieldSpec fieldSpec : schema.getAllFieldSpecs()) {
         if (!fieldSpec.isVirtualColumn()) {
-          String column = fieldSpec.getName();
-          String transformFunction = fieldSpec.getTransformFunction();
-          if (transformFunction != null) {
-            FunctionEvaluator functionEvaluator = FunctionEvaluatorFactory.getExpressionEvaluator(fieldSpec);
-            if (functionEvaluator != null) {
-              List<String> arguments = functionEvaluator.getArguments();
-              // output column used as input
-              if (arguments.contains(column)) {
-                logger.error("The arguments of transform function: {}, should not contain the destination column: {}",
-                    transformFunction, column);
-                return false;
-              }
+          if (validateFieldNames && !isValidFieldName(fieldSpec)) {
+            return false;
+          }
+          if (!isValidTransformFunction(fieldSpec)) {
+            return false;
+          }
+          FieldSpec.FieldType fieldType = fieldSpec.getFieldType();
+          if (fieldType.equals(FieldSpec.FieldType.DATE_TIME)) {
+            if (!isValidDateTimeFieldSpec(fieldSpec)) {
+              return false;
             }
-          } else if (fieldSpec.getFieldType().equals(FieldSpec.FieldType.TIME)) {
-            TimeFieldSpec timeFieldSpec = (TimeFieldSpec) fieldSpec;
-            TimeGranularitySpec incomingGranularitySpec = timeFieldSpec.getIncomingGranularitySpec();
-            TimeGranularitySpec outgoingGranularitySpec = timeFieldSpec.getOutgoingGranularitySpec();
-
-            if (!incomingGranularitySpec.equals(outgoingGranularitySpec)) {
-              // different incoming and outgoing spec, but same name
-              if (incomingGranularitySpec.getName().equals(outgoingGranularitySpec.getName())) {
-                logger.error("Cannot convert from incoming field spec:{} to outgoing field spec:{} if name is the same",
-                    incomingGranularitySpec, outgoingGranularitySpec);
-                return false;
-              } else {
-                if (!incomingGranularitySpec.getTimeFormat().equals(TimeGranularitySpec.TimeFormat.EPOCH.toString())
-                    || !outgoingGranularitySpec.getTimeFormat()
-                    .equals(TimeGranularitySpec.TimeFormat.EPOCH.toString())) {
-                  logger.error(
-                      "When incoming and outgoing specs are different, cannot perform time conversion for time format other than EPOCH");
-                  return false;
-                }
-              }
+          } else if (fieldType.equals(FieldSpec.FieldType.TIME)) {
+            if (!isValidTimeFieldSpec(fieldSpec)) {
+              return false;
             }
           }
         }
       }
     } catch (Exception e) {
-      logger.error("Exception in validating schema {}", schema.getSchemaName(), e);
+      LOGGER.error("Exception in validating schema {}", schema.getSchemaName(), e);
       return false;
     }
     return schema.validate(logger);
   }
+
+  /**
+   * Checks if any of the keywords which are reserved under the sql parser are used
+   */
+  private static boolean isValidFieldName(FieldSpec fieldSpec) {
+    String columnName = fieldSpec.getName();
+    if (SQL_PARSER_METADATA.isReservedWord(columnName.toUpperCase())) {
+      LOGGER.error("Cannot use SQL reserved word {} as field name in the schema", columnName);
+      return false;
+    }
+    return true;
+  }
+
+  /**
+   * Checks for valid transform function string
+   */
+  private static boolean isValidTransformFunction(FieldSpec fieldSpec) {
+    String column = fieldSpec.getName();
+    String transformFunction = fieldSpec.getTransformFunction();
+    if (transformFunction != null) {
+      try {
+        List<String> arguments = FunctionEvaluatorFactory.getExpressionEvaluator(fieldSpec).getArguments();
+        // output column used as input
+        if (arguments.contains(column)) {
+          LOGGER.error("The arguments of transform function: {}, should not contain the destination column: {}",

Review comment:
       Let us be consistent -- call one of "field" or "column" in error messages. I am fine with either one, but consistency is useful.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] npawar commented on pull request #5642: Add more schema validations

Posted by GitBox <gi...@apache.org>.

npawar commented on pull request #5642:
URL: https://github.com/apache/incubator-pinot/pull/5642#issuecomment-655059302


   Any comments or concerns @haibow @mayankshriv @mcvsubbu ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [incubator-pinot] npawar commented on pull request #5642: Add more schema validations

Posted by GitBox <gi...@apache.org>.

npawar commented on pull request #5642:
URL: https://github.com/apache/incubator-pinot/pull/5642#issuecomment-657840758


   I'm beginning to think we shouldn't do reserved keywords validation at all. Reasons being,
   1. standard sql behavior allows columns to have such names. By making this change, we're essentially forever banning use of certain words from Pinot tables. Might make migration of datasets from one source into Pinot difficult
   2. One of the motivations of doing this was reducing the occurrences of our open source users getting stuck on these words while querying. But this validation will move the problem from query time to ingestion time. It might not always be possible for users to change their source data names, so we would have to educate them on setting transform functions.
   
   I'll change this PR to include only the date time field spec validation @siddharthteotia @mcvsubbu 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org