You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/07/16 05:54:07 UTC

[GitHub] [incubator-pinot] Jackie-Jiang opened a new pull request #5709: Store column min/max value into segment metadata

Jackie-Jiang opened a new pull request #5709:
URL: https://github.com/apache/incubator-pinot/pull/5709


   ## Description
   Store column min/max value into segment metadata
   
   Column min/max value is very useful information and can be used for query optimization.
   When creating the segment, we already collected the column min/max value, but not save them to the metadata.
   
   Changes:
   - Store the already collected min/max value to the segment metadata
   - Handle the escaping of special character in property value string
   - Enhance the `SegmentColumnarIndexCreator.removeColumnMetadataInfo()` to handle the removal of new added properties (`PARTITION_FUNCTION` etc.)
   - Support min/max value for BYTES columns


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5709: Store column min/max value into segment metadata

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5709:
URL: https://github.com/apache/incubator-pinot/pull/5709#discussion_r455965675



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/index/loader/columnminmaxvalue/ColumnMinMaxValueGenerator.java
##########
@@ -91,33 +92,46 @@ private void addColumnMinMaxValueForColumn(String columnName) throws Exception {
     switch (dataType) {
       case INT:
         try (IntDictionary intDictionary = new IntDictionary(dictionaryBuffer, length)) {
-          SegmentColumnarIndexCreator.addColumnMinMaxValueInfo(_segmentProperties, columnName,
-              intDictionary.getStringValue(0), intDictionary.getStringValue(length - 1));
+          SegmentColumnarIndexCreator

Review comment:
       The `ColumnMinMaxValueGenerator` is on server side (one of the pre-processors) generating column min/max value on the fly during segment load. We might not want to pay the cost of scanning all the values during segment load, so I kept the existing behavior.
   The segment creation part is handled in `SegmentColumnarIndexCreator` where min/max values are added for both dictionary-encoded and raw index.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on pull request #5709: Store column min/max value into segment metadata

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on pull request #5709:
URL: https://github.com/apache/incubator-pinot/pull/5709#issuecomment-659762915


   So to be on the same page, we are doing the following:
   
   1. Until now, the segment load code generated ColumnMinMaxValue **only for dictionary encoded columns.** This was then persisted in PropertiesConfiguration and made available to ColumnMetatadata and used for pruning. The default min-max value generator mode was for TIME column until now and this PR changes it to ALL. May be we should have a mode like DIMENSIONS ?
   
   2. This PR adds the support for generating the min-max value for **both raw and dictionary encoded columns** during segment generation and persisting in SegmentMetadata. The generator mode used here is ALL?
   
   3. Now that going forward we will have min-max available in SegmentMetadata, we detect that in segment load code and return. 
   
   @Jackie-Jiang , would be good to add/edit anything I may have missed. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #5709: Store column min/max value into segment metadata

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #5709:
URL: https://github.com/apache/incubator-pinot/pull/5709#discussion_r455905335



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/index/loader/columnminmaxvalue/ColumnMinMaxValueGenerator.java
##########
@@ -91,33 +92,46 @@ private void addColumnMinMaxValueForColumn(String columnName) throws Exception {
     switch (dataType) {
       case INT:
         try (IntDictionary intDictionary = new IntDictionary(dictionaryBuffer, length)) {
-          SegmentColumnarIndexCreator.addColumnMinMaxValueInfo(_segmentProperties, columnName,
-              intDictionary.getStringValue(0), intDictionary.getStringValue(length - 1));
+          SegmentColumnarIndexCreator

Review comment:
       How about no-dictionary columns? We do go over all the values, so we should be able to compute that information even without dictionary?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -509,71 +509,61 @@ public static void addColumnMetadataInfo(PropertiesConfiguration properties, Str
     properties.setProperty(getKeyFor(column, HAS_NULL_VALUE), String.valueOf(columnIndexCreationInfo.hasNulls()));
     properties.setProperty(getKeyFor(column, HAS_DICTIONARY), String.valueOf(hasDictionary));
     properties.setProperty(getKeyFor(column, TEXT_INDEX_TYPE), textIndexType.name());
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, HAS_INVERTED_INDEX),
-        String.valueOf(hasInvertedIndex));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, IS_SINGLE_VALUED),
-        String.valueOf(fieldSpec.isSingleValueField()));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, MAX_MULTI_VALUE_ELEMTS),
+    properties.setProperty(getKeyFor(column, HAS_INVERTED_INDEX), String.valueOf(hasInvertedIndex));
+    properties.setProperty(getKeyFor(column, IS_SINGLE_VALUED), String.valueOf(fieldSpec.isSingleValueField()));
+    properties.setProperty(getKeyFor(column, MAX_MULTI_VALUE_ELEMTS),
         String.valueOf(columnIndexCreationInfo.getMaxNumberOfMultiValueElements()));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, TOTAL_NUMBER_OF_ENTRIES),
+    properties.setProperty(getKeyFor(column, TOTAL_NUMBER_OF_ENTRIES),
         String.valueOf(columnIndexCreationInfo.getTotalNumberOfEntries()));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, IS_AUTO_GENERATED),
-        String.valueOf(columnIndexCreationInfo.isAutoGenerated()));
+    properties
+        .setProperty(getKeyFor(column, IS_AUTO_GENERATED), String.valueOf(columnIndexCreationInfo.isAutoGenerated()));
 
     PartitionFunction partitionFunction = columnIndexCreationInfo.getPartitionFunction();
     if (partitionFunction != null) {
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, PARTITION_FUNCTION),
-          partitionFunction.toString());
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, NUM_PARTITIONS),
-          columnIndexCreationInfo.getNumPartitions());
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, PARTITION_VALUES),
-          columnIndexCreationInfo.getPartitions());
+      properties.setProperty(getKeyFor(column, PARTITION_FUNCTION), partitionFunction.toString());
+      properties.setProperty(getKeyFor(column, NUM_PARTITIONS), columnIndexCreationInfo.getNumPartitions());
+      properties.setProperty(getKeyFor(column, PARTITION_VALUES), columnIndexCreationInfo.getPartitions());
     }
 
     // datetime field
     if (fieldSpec.getFieldType().equals(FieldType.DATE_TIME)) {
       DateTimeFieldSpec dateTimeFieldSpec = (DateTimeFieldSpec) fieldSpec;
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DATETIME_FORMAT),
-          dateTimeFieldSpec.getFormat());
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DATETIME_GRANULARITY),
-          dateTimeFieldSpec.getGranularity());
+      properties.setProperty(getKeyFor(column, DATETIME_FORMAT), dateTimeFieldSpec.getFormat());
+      properties.setProperty(getKeyFor(column, DATETIME_GRANULARITY), dateTimeFieldSpec.getGranularity());
     }
 
-    Object defaultNullValue = columnIndexCreationInfo.getDefaultNullValue();
-    if (defaultNullValue instanceof byte[]) {
-      String defaultNullValueString = BytesUtils.toHexString((byte[]) defaultNullValue);
-      properties
-          .setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DEFAULT_NULL_VALUE), defaultNullValueString);
+    String minValue = columnIndexCreationInfo.getMin().toString();
+    String maxValue = columnIndexCreationInfo.getMax().toString();
+    String defaultNullValue = columnIndexCreationInfo.getDefaultNullValue().toString();
+    if (dataType == DataType.STRING) {
+      // Escape special character for STRING column
+      properties.setProperty(getKeyFor(column, MIN_VALUE), escapeSpecialCharacter(minValue));
+      properties.setProperty(getKeyFor(column, MAX_VALUE), escapeSpecialCharacter(maxValue));
+      properties.setProperty(getKeyFor(column, DEFAULT_NULL_VALUE), escapeSpecialCharacter(defaultNullValue));
     } else {
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DEFAULT_NULL_VALUE),
-          String.valueOf(defaultNullValue));
+      properties.setProperty(getKeyFor(column, MIN_VALUE), minValue);
+      properties.setProperty(getKeyFor(column, MAX_VALUE), maxValue);
+      properties.setProperty(getKeyFor(column, DEFAULT_NULL_VALUE), defaultNullValue);
     }
   }
 
   public static void addColumnMinMaxValueInfo(PropertiesConfiguration properties, String column, String minValue,
       String maxValue) {
-    properties.setProperty(getKeyFor(column, MIN_VALUE), minValue);
-    properties.setProperty(getKeyFor(column, MAX_VALUE), maxValue);
+    // Escape special character for STRING column
+    properties.setProperty(getKeyFor(column, MIN_VALUE), escapeSpecialCharacter(minValue));
+    properties.setProperty(getKeyFor(column, MAX_VALUE), escapeSpecialCharacter(maxValue));
+  }
+
+  /**
+   * Helper method to escape special character for the property value.
+   */
+  private static String escapeSpecialCharacter(String value) {

Review comment:
       Since this is only escaping line character, so may be it should be name `escapeLineCharacter`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5709: Store column min/max value into segment metadata

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5709:
URL: https://github.com/apache/incubator-pinot/pull/5709#discussion_r455955208



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -509,71 +509,61 @@ public static void addColumnMetadataInfo(PropertiesConfiguration properties, Str
     properties.setProperty(getKeyFor(column, HAS_NULL_VALUE), String.valueOf(columnIndexCreationInfo.hasNulls()));
     properties.setProperty(getKeyFor(column, HAS_DICTIONARY), String.valueOf(hasDictionary));
     properties.setProperty(getKeyFor(column, TEXT_INDEX_TYPE), textIndexType.name());
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, HAS_INVERTED_INDEX),
-        String.valueOf(hasInvertedIndex));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, IS_SINGLE_VALUED),
-        String.valueOf(fieldSpec.isSingleValueField()));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, MAX_MULTI_VALUE_ELEMTS),
+    properties.setProperty(getKeyFor(column, HAS_INVERTED_INDEX), String.valueOf(hasInvertedIndex));
+    properties.setProperty(getKeyFor(column, IS_SINGLE_VALUED), String.valueOf(fieldSpec.isSingleValueField()));
+    properties.setProperty(getKeyFor(column, MAX_MULTI_VALUE_ELEMTS),
         String.valueOf(columnIndexCreationInfo.getMaxNumberOfMultiValueElements()));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, TOTAL_NUMBER_OF_ENTRIES),
+    properties.setProperty(getKeyFor(column, TOTAL_NUMBER_OF_ENTRIES),
         String.valueOf(columnIndexCreationInfo.getTotalNumberOfEntries()));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, IS_AUTO_GENERATED),
-        String.valueOf(columnIndexCreationInfo.isAutoGenerated()));
+    properties
+        .setProperty(getKeyFor(column, IS_AUTO_GENERATED), String.valueOf(columnIndexCreationInfo.isAutoGenerated()));
 
     PartitionFunction partitionFunction = columnIndexCreationInfo.getPartitionFunction();
     if (partitionFunction != null) {
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, PARTITION_FUNCTION),
-          partitionFunction.toString());
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, NUM_PARTITIONS),
-          columnIndexCreationInfo.getNumPartitions());
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, PARTITION_VALUES),
-          columnIndexCreationInfo.getPartitions());
+      properties.setProperty(getKeyFor(column, PARTITION_FUNCTION), partitionFunction.toString());
+      properties.setProperty(getKeyFor(column, NUM_PARTITIONS), columnIndexCreationInfo.getNumPartitions());
+      properties.setProperty(getKeyFor(column, PARTITION_VALUES), columnIndexCreationInfo.getPartitions());
     }
 
     // datetime field
     if (fieldSpec.getFieldType().equals(FieldType.DATE_TIME)) {
       DateTimeFieldSpec dateTimeFieldSpec = (DateTimeFieldSpec) fieldSpec;
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DATETIME_FORMAT),
-          dateTimeFieldSpec.getFormat());
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DATETIME_GRANULARITY),
-          dateTimeFieldSpec.getGranularity());
+      properties.setProperty(getKeyFor(column, DATETIME_FORMAT), dateTimeFieldSpec.getFormat());
+      properties.setProperty(getKeyFor(column, DATETIME_GRANULARITY), dateTimeFieldSpec.getGranularity());
     }
 
-    Object defaultNullValue = columnIndexCreationInfo.getDefaultNullValue();
-    if (defaultNullValue instanceof byte[]) {
-      String defaultNullValueString = BytesUtils.toHexString((byte[]) defaultNullValue);
-      properties
-          .setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DEFAULT_NULL_VALUE), defaultNullValueString);
+    String minValue = columnIndexCreationInfo.getMin().toString();
+    String maxValue = columnIndexCreationInfo.getMax().toString();
+    String defaultNullValue = columnIndexCreationInfo.getDefaultNullValue().toString();
+    if (dataType == DataType.STRING) {
+      // Escape special character for STRING column
+      properties.setProperty(getKeyFor(column, MIN_VALUE), escapeSpecialCharacter(minValue));
+      properties.setProperty(getKeyFor(column, MAX_VALUE), escapeSpecialCharacter(maxValue));
+      properties.setProperty(getKeyFor(column, DEFAULT_NULL_VALUE), escapeSpecialCharacter(defaultNullValue));
     } else {
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DEFAULT_NULL_VALUE),
-          String.valueOf(defaultNullValue));
+      properties.setProperty(getKeyFor(column, MIN_VALUE), minValue);
+      properties.setProperty(getKeyFor(column, MAX_VALUE), maxValue);
+      properties.setProperty(getKeyFor(column, DEFAULT_NULL_VALUE), defaultNullValue);
     }
   }
 
   public static void addColumnMinMaxValueInfo(PropertiesConfiguration properties, String column, String minValue,
       String maxValue) {
-    properties.setProperty(getKeyFor(column, MIN_VALUE), minValue);
-    properties.setProperty(getKeyFor(column, MAX_VALUE), maxValue);
+    // Escape special character for STRING column
+    properties.setProperty(getKeyFor(column, MIN_VALUE), escapeSpecialCharacter(minValue));
+    properties.setProperty(getKeyFor(column, MAX_VALUE), escapeSpecialCharacter(maxValue));
+  }
+
+  /**
+   * Helper method to escape special character for the property value.
+   */
+  private static String escapeSpecialCharacter(String value) {

Review comment:
       Renamed to `escapeListSeparator`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5709: Store column min/max value into segment metadata

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #5709:
URL: https://github.com/apache/incubator-pinot/pull/5709#discussion_r455934483



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -509,71 +509,61 @@ public static void addColumnMetadataInfo(PropertiesConfiguration properties, Str
     properties.setProperty(getKeyFor(column, HAS_NULL_VALUE), String.valueOf(columnIndexCreationInfo.hasNulls()));
     properties.setProperty(getKeyFor(column, HAS_DICTIONARY), String.valueOf(hasDictionary));
     properties.setProperty(getKeyFor(column, TEXT_INDEX_TYPE), textIndexType.name());
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, HAS_INVERTED_INDEX),
-        String.valueOf(hasInvertedIndex));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, IS_SINGLE_VALUED),
-        String.valueOf(fieldSpec.isSingleValueField()));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, MAX_MULTI_VALUE_ELEMTS),
+    properties.setProperty(getKeyFor(column, HAS_INVERTED_INDEX), String.valueOf(hasInvertedIndex));
+    properties.setProperty(getKeyFor(column, IS_SINGLE_VALUED), String.valueOf(fieldSpec.isSingleValueField()));
+    properties.setProperty(getKeyFor(column, MAX_MULTI_VALUE_ELEMTS),
         String.valueOf(columnIndexCreationInfo.getMaxNumberOfMultiValueElements()));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, TOTAL_NUMBER_OF_ENTRIES),
+    properties.setProperty(getKeyFor(column, TOTAL_NUMBER_OF_ENTRIES),
         String.valueOf(columnIndexCreationInfo.getTotalNumberOfEntries()));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, IS_AUTO_GENERATED),
-        String.valueOf(columnIndexCreationInfo.isAutoGenerated()));
+    properties
+        .setProperty(getKeyFor(column, IS_AUTO_GENERATED), String.valueOf(columnIndexCreationInfo.isAutoGenerated()));
 
     PartitionFunction partitionFunction = columnIndexCreationInfo.getPartitionFunction();
     if (partitionFunction != null) {
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, PARTITION_FUNCTION),
-          partitionFunction.toString());
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, NUM_PARTITIONS),
-          columnIndexCreationInfo.getNumPartitions());
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, PARTITION_VALUES),
-          columnIndexCreationInfo.getPartitions());
+      properties.setProperty(getKeyFor(column, PARTITION_FUNCTION), partitionFunction.toString());
+      properties.setProperty(getKeyFor(column, NUM_PARTITIONS), columnIndexCreationInfo.getNumPartitions());
+      properties.setProperty(getKeyFor(column, PARTITION_VALUES), columnIndexCreationInfo.getPartitions());
     }
 
     // datetime field
     if (fieldSpec.getFieldType().equals(FieldType.DATE_TIME)) {
       DateTimeFieldSpec dateTimeFieldSpec = (DateTimeFieldSpec) fieldSpec;
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DATETIME_FORMAT),
-          dateTimeFieldSpec.getFormat());
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DATETIME_GRANULARITY),
-          dateTimeFieldSpec.getGranularity());
+      properties.setProperty(getKeyFor(column, DATETIME_FORMAT), dateTimeFieldSpec.getFormat());
+      properties.setProperty(getKeyFor(column, DATETIME_GRANULARITY), dateTimeFieldSpec.getGranularity());
     }
 
-    Object defaultNullValue = columnIndexCreationInfo.getDefaultNullValue();
-    if (defaultNullValue instanceof byte[]) {
-      String defaultNullValueString = BytesUtils.toHexString((byte[]) defaultNullValue);
-      properties
-          .setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DEFAULT_NULL_VALUE), defaultNullValueString);
+    String minValue = columnIndexCreationInfo.getMin().toString();
+    String maxValue = columnIndexCreationInfo.getMax().toString();
+    String defaultNullValue = columnIndexCreationInfo.getDefaultNullValue().toString();
+    if (dataType == DataType.STRING) {
+      // Escape special character for STRING column
+      properties.setProperty(getKeyFor(column, MIN_VALUE), escapeSpecialCharacter(minValue));
+      properties.setProperty(getKeyFor(column, MAX_VALUE), escapeSpecialCharacter(maxValue));
+      properties.setProperty(getKeyFor(column, DEFAULT_NULL_VALUE), escapeSpecialCharacter(defaultNullValue));
     } else {
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DEFAULT_NULL_VALUE),
-          String.valueOf(defaultNullValue));
+      properties.setProperty(getKeyFor(column, MIN_VALUE), minValue);
+      properties.setProperty(getKeyFor(column, MAX_VALUE), maxValue);
+      properties.setProperty(getKeyFor(column, DEFAULT_NULL_VALUE), defaultNullValue);
     }
   }
 
   public static void addColumnMinMaxValueInfo(PropertiesConfiguration properties, String column, String minValue,

Review comment:
       This was essentially setting the min and max value in SegmentMetadata right? It is then used in the code during pruning etc by deserializing into ColumnMetadata. May be I am missing something here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5709: Store column min/max value into segment metadata

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5709:
URL: https://github.com/apache/incubator-pinot/pull/5709#discussion_r455968907



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -509,71 +509,61 @@ public static void addColumnMetadataInfo(PropertiesConfiguration properties, Str
     properties.setProperty(getKeyFor(column, HAS_NULL_VALUE), String.valueOf(columnIndexCreationInfo.hasNulls()));
     properties.setProperty(getKeyFor(column, HAS_DICTIONARY), String.valueOf(hasDictionary));
     properties.setProperty(getKeyFor(column, TEXT_INDEX_TYPE), textIndexType.name());
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, HAS_INVERTED_INDEX),
-        String.valueOf(hasInvertedIndex));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, IS_SINGLE_VALUED),
-        String.valueOf(fieldSpec.isSingleValueField()));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, MAX_MULTI_VALUE_ELEMTS),
+    properties.setProperty(getKeyFor(column, HAS_INVERTED_INDEX), String.valueOf(hasInvertedIndex));
+    properties.setProperty(getKeyFor(column, IS_SINGLE_VALUED), String.valueOf(fieldSpec.isSingleValueField()));
+    properties.setProperty(getKeyFor(column, MAX_MULTI_VALUE_ELEMTS),
         String.valueOf(columnIndexCreationInfo.getMaxNumberOfMultiValueElements()));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, TOTAL_NUMBER_OF_ENTRIES),
+    properties.setProperty(getKeyFor(column, TOTAL_NUMBER_OF_ENTRIES),
         String.valueOf(columnIndexCreationInfo.getTotalNumberOfEntries()));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, IS_AUTO_GENERATED),
-        String.valueOf(columnIndexCreationInfo.isAutoGenerated()));
+    properties
+        .setProperty(getKeyFor(column, IS_AUTO_GENERATED), String.valueOf(columnIndexCreationInfo.isAutoGenerated()));
 
     PartitionFunction partitionFunction = columnIndexCreationInfo.getPartitionFunction();
     if (partitionFunction != null) {
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, PARTITION_FUNCTION),
-          partitionFunction.toString());
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, NUM_PARTITIONS),
-          columnIndexCreationInfo.getNumPartitions());
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, PARTITION_VALUES),
-          columnIndexCreationInfo.getPartitions());
+      properties.setProperty(getKeyFor(column, PARTITION_FUNCTION), partitionFunction.toString());
+      properties.setProperty(getKeyFor(column, NUM_PARTITIONS), columnIndexCreationInfo.getNumPartitions());
+      properties.setProperty(getKeyFor(column, PARTITION_VALUES), columnIndexCreationInfo.getPartitions());
     }
 
     // datetime field
     if (fieldSpec.getFieldType().equals(FieldType.DATE_TIME)) {
       DateTimeFieldSpec dateTimeFieldSpec = (DateTimeFieldSpec) fieldSpec;
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DATETIME_FORMAT),
-          dateTimeFieldSpec.getFormat());
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DATETIME_GRANULARITY),
-          dateTimeFieldSpec.getGranularity());
+      properties.setProperty(getKeyFor(column, DATETIME_FORMAT), dateTimeFieldSpec.getFormat());
+      properties.setProperty(getKeyFor(column, DATETIME_GRANULARITY), dateTimeFieldSpec.getGranularity());
     }
 
-    Object defaultNullValue = columnIndexCreationInfo.getDefaultNullValue();
-    if (defaultNullValue instanceof byte[]) {
-      String defaultNullValueString = BytesUtils.toHexString((byte[]) defaultNullValue);
-      properties
-          .setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DEFAULT_NULL_VALUE), defaultNullValueString);
+    String minValue = columnIndexCreationInfo.getMin().toString();
+    String maxValue = columnIndexCreationInfo.getMax().toString();
+    String defaultNullValue = columnIndexCreationInfo.getDefaultNullValue().toString();
+    if (dataType == DataType.STRING) {
+      // Escape special character for STRING column
+      properties.setProperty(getKeyFor(column, MIN_VALUE), escapeSpecialCharacter(minValue));
+      properties.setProperty(getKeyFor(column, MAX_VALUE), escapeSpecialCharacter(maxValue));
+      properties.setProperty(getKeyFor(column, DEFAULT_NULL_VALUE), escapeSpecialCharacter(defaultNullValue));
     } else {
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DEFAULT_NULL_VALUE),
-          String.valueOf(defaultNullValue));
+      properties.setProperty(getKeyFor(column, MIN_VALUE), minValue);
+      properties.setProperty(getKeyFor(column, MAX_VALUE), maxValue);
+      properties.setProperty(getKeyFor(column, DEFAULT_NULL_VALUE), defaultNullValue);
     }
   }
 
   public static void addColumnMinMaxValueInfo(PropertiesConfiguration properties, String column, String minValue,

Review comment:
       Yes, the metadata will be stored into `SegmentMetadata`. `ColumnMetadata` is part of the `SegmentMetadata`. The min/max value inside the `DataSourceMetadata` are coming from there.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on pull request #5709: Store column min/max value into segment metadata

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on pull request #5709:
URL: https://github.com/apache/incubator-pinot/pull/5709#issuecomment-659780203


   @siddharthteotia 
   
   > 1. Until now, the segment load code generated ColumnMinMaxValue **only for dictionary encoded columns.** This was then persisted in PropertiesConfiguration and made available to ColumnMetatadata and used for pruning. The default min-max value generator mode was for TIME column until now and this PR changes it to ALL. May be we should have a mode like DIMENSIONS ?
   
   We made it tier-based: TIME -> DIMENSION -> METRICS
   ColumnMinMaxValueGenerator serves as a backup to add min/max value for existing segments. All new segments will have min/max value for all columns (both dictionary-encoded and raw).
   
   > 2. This PR adds the support for generating the min-max value for **both raw and dictionary encoded columns** during segment generation and persisting in SegmentMetadata. The generator mode used here is ALL?
   
   ColumnMinMaxValueGeneratorMode only associates with ColumnMinMaxValueGenerator, which generates column min/max value while loading the segments. It does not apply to the segment creation.
   
   > 3. Now that going forward we will have min-max available in SegmentMetadata, we detect that in segment load code and return.
   
   Yes. ColumnMinMaxValueGenerator generates column min/max value only when they are not available in the metadata.
   
   > However, not sure what is special about leading or trailing special character that PropertiesConfiguration can't handle. Would we have to continue to make the "isValidPropertyValue" code more robust?
   
   Leading and trailing whitespace will be trimmed by PropertiesConfiguration; list separator is preserved for the list. Other than that, there should be no rules for `isValidPropertyValue`. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang merged pull request #5709: Store column min/max value into segment metadata

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang merged pull request #5709:
URL: https://github.com/apache/incubator-pinot/pull/5709


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5709: Store column min/max value into segment metadata

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5709:
URL: https://github.com/apache/incubator-pinot/pull/5709#discussion_r456134670



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -509,71 +509,61 @@ public static void addColumnMetadataInfo(PropertiesConfiguration properties, Str
     properties.setProperty(getKeyFor(column, HAS_NULL_VALUE), String.valueOf(columnIndexCreationInfo.hasNulls()));
     properties.setProperty(getKeyFor(column, HAS_DICTIONARY), String.valueOf(hasDictionary));
     properties.setProperty(getKeyFor(column, TEXT_INDEX_TYPE), textIndexType.name());
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, HAS_INVERTED_INDEX),
-        String.valueOf(hasInvertedIndex));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, IS_SINGLE_VALUED),
-        String.valueOf(fieldSpec.isSingleValueField()));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, MAX_MULTI_VALUE_ELEMTS),
+    properties.setProperty(getKeyFor(column, HAS_INVERTED_INDEX), String.valueOf(hasInvertedIndex));
+    properties.setProperty(getKeyFor(column, IS_SINGLE_VALUED), String.valueOf(fieldSpec.isSingleValueField()));
+    properties.setProperty(getKeyFor(column, MAX_MULTI_VALUE_ELEMTS),
         String.valueOf(columnIndexCreationInfo.getMaxNumberOfMultiValueElements()));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, TOTAL_NUMBER_OF_ENTRIES),
+    properties.setProperty(getKeyFor(column, TOTAL_NUMBER_OF_ENTRIES),
         String.valueOf(columnIndexCreationInfo.getTotalNumberOfEntries()));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, IS_AUTO_GENERATED),
-        String.valueOf(columnIndexCreationInfo.isAutoGenerated()));
+    properties
+        .setProperty(getKeyFor(column, IS_AUTO_GENERATED), String.valueOf(columnIndexCreationInfo.isAutoGenerated()));
 
     PartitionFunction partitionFunction = columnIndexCreationInfo.getPartitionFunction();
     if (partitionFunction != null) {
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, PARTITION_FUNCTION),
-          partitionFunction.toString());
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, NUM_PARTITIONS),
-          columnIndexCreationInfo.getNumPartitions());
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, PARTITION_VALUES),
-          columnIndexCreationInfo.getPartitions());
+      properties.setProperty(getKeyFor(column, PARTITION_FUNCTION), partitionFunction.toString());
+      properties.setProperty(getKeyFor(column, NUM_PARTITIONS), columnIndexCreationInfo.getNumPartitions());
+      properties.setProperty(getKeyFor(column, PARTITION_VALUES), columnIndexCreationInfo.getPartitions());
     }
 
     // datetime field
     if (fieldSpec.getFieldType().equals(FieldType.DATE_TIME)) {
       DateTimeFieldSpec dateTimeFieldSpec = (DateTimeFieldSpec) fieldSpec;
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DATETIME_FORMAT),
-          dateTimeFieldSpec.getFormat());
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DATETIME_GRANULARITY),
-          dateTimeFieldSpec.getGranularity());
+      properties.setProperty(getKeyFor(column, DATETIME_FORMAT), dateTimeFieldSpec.getFormat());
+      properties.setProperty(getKeyFor(column, DATETIME_GRANULARITY), dateTimeFieldSpec.getGranularity());
     }
 
-    Object defaultNullValue = columnIndexCreationInfo.getDefaultNullValue();
-    if (defaultNullValue instanceof byte[]) {
-      String defaultNullValueString = BytesUtils.toHexString((byte[]) defaultNullValue);
-      properties
-          .setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DEFAULT_NULL_VALUE), defaultNullValueString);
+    String minValue = columnIndexCreationInfo.getMin().toString();
+    String maxValue = columnIndexCreationInfo.getMax().toString();
+    String defaultNullValue = columnIndexCreationInfo.getDefaultNullValue().toString();
+    if (dataType == DataType.STRING) {
+      // Escape special character for STRING column
+      properties.setProperty(getKeyFor(column, MIN_VALUE), escapeSpecialCharacter(minValue));
+      properties.setProperty(getKeyFor(column, MAX_VALUE), escapeSpecialCharacter(maxValue));
+      properties.setProperty(getKeyFor(column, DEFAULT_NULL_VALUE), escapeSpecialCharacter(defaultNullValue));
     } else {
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DEFAULT_NULL_VALUE),
-          String.valueOf(defaultNullValue));
+      properties.setProperty(getKeyFor(column, MIN_VALUE), minValue);
+      properties.setProperty(getKeyFor(column, MAX_VALUE), maxValue);
+      properties.setProperty(getKeyFor(column, DEFAULT_NULL_VALUE), defaultNullValue);
     }
   }
 
   public static void addColumnMinMaxValueInfo(PropertiesConfiguration properties, String column, String minValue,
       String maxValue) {
-    properties.setProperty(getKeyFor(column, MIN_VALUE), minValue);
-    properties.setProperty(getKeyFor(column, MAX_VALUE), maxValue);
+    // Escape special character for STRING column
+    properties.setProperty(getKeyFor(column, MIN_VALUE), escapeSpecialCharacter(minValue));
+    properties.setProperty(getKeyFor(column, MAX_VALUE), escapeSpecialCharacter(maxValue));
+  }
+
+  /**
+   * Helper method to escape special character for the property value.
+   */
+  private static String escapeSpecialCharacter(String value) {

Review comment:
       `PropertiesConfiguration` is handling the escape while writing and unescape while reading, but it cannot handle leading/trailing whitespace and list separator properly.
   Added a test and found out that there is no way to handle all scenarios. Change the logic to not storing the value if `PropertiesConfiguration` is not able to read the original value back.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5709: Store column min/max value into segment metadata

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5709:
URL: https://github.com/apache/incubator-pinot/pull/5709#discussion_r456169610



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -509,71 +510,83 @@ public static void addColumnMetadataInfo(PropertiesConfiguration properties, Str
     properties.setProperty(getKeyFor(column, HAS_NULL_VALUE), String.valueOf(columnIndexCreationInfo.hasNulls()));
     properties.setProperty(getKeyFor(column, HAS_DICTIONARY), String.valueOf(hasDictionary));
     properties.setProperty(getKeyFor(column, TEXT_INDEX_TYPE), textIndexType.name());
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, HAS_INVERTED_INDEX),
-        String.valueOf(hasInvertedIndex));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, IS_SINGLE_VALUED),
-        String.valueOf(fieldSpec.isSingleValueField()));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, MAX_MULTI_VALUE_ELEMTS),
+    properties.setProperty(getKeyFor(column, HAS_INVERTED_INDEX), String.valueOf(hasInvertedIndex));
+    properties.setProperty(getKeyFor(column, IS_SINGLE_VALUED), String.valueOf(fieldSpec.isSingleValueField()));
+    properties.setProperty(getKeyFor(column, MAX_MULTI_VALUE_ELEMTS),
         String.valueOf(columnIndexCreationInfo.getMaxNumberOfMultiValueElements()));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, TOTAL_NUMBER_OF_ENTRIES),
+    properties.setProperty(getKeyFor(column, TOTAL_NUMBER_OF_ENTRIES),
         String.valueOf(columnIndexCreationInfo.getTotalNumberOfEntries()));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, IS_AUTO_GENERATED),
-        String.valueOf(columnIndexCreationInfo.isAutoGenerated()));
+    properties
+        .setProperty(getKeyFor(column, IS_AUTO_GENERATED), String.valueOf(columnIndexCreationInfo.isAutoGenerated()));
 
     PartitionFunction partitionFunction = columnIndexCreationInfo.getPartitionFunction();
     if (partitionFunction != null) {
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, PARTITION_FUNCTION),
-          partitionFunction.toString());
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, NUM_PARTITIONS),
-          columnIndexCreationInfo.getNumPartitions());
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, PARTITION_VALUES),
-          columnIndexCreationInfo.getPartitions());
+      properties.setProperty(getKeyFor(column, PARTITION_FUNCTION), partitionFunction.toString());
+      properties.setProperty(getKeyFor(column, NUM_PARTITIONS), columnIndexCreationInfo.getNumPartitions());
+      properties.setProperty(getKeyFor(column, PARTITION_VALUES), columnIndexCreationInfo.getPartitions());
     }
 
     // datetime field
     if (fieldSpec.getFieldType().equals(FieldType.DATE_TIME)) {
       DateTimeFieldSpec dateTimeFieldSpec = (DateTimeFieldSpec) fieldSpec;
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DATETIME_FORMAT),
-          dateTimeFieldSpec.getFormat());
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DATETIME_GRANULARITY),
-          dateTimeFieldSpec.getGranularity());
+      properties.setProperty(getKeyFor(column, DATETIME_FORMAT), dateTimeFieldSpec.getFormat());
+      properties.setProperty(getKeyFor(column, DATETIME_GRANULARITY), dateTimeFieldSpec.getGranularity());
     }
 
-    Object defaultNullValue = columnIndexCreationInfo.getDefaultNullValue();
-    if (defaultNullValue instanceof byte[]) {
-      String defaultNullValueString = BytesUtils.toHexString((byte[]) defaultNullValue);
-      properties
-          .setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DEFAULT_NULL_VALUE), defaultNullValueString);
+    String minValue = columnIndexCreationInfo.getMin().toString();
+    String maxValue = columnIndexCreationInfo.getMax().toString();
+    String defaultNullValue = columnIndexCreationInfo.getDefaultNullValue().toString();
+    if (dataType == DataType.STRING) {
+      // Check special characters for STRING column
+      if (isValidPropertyValue(minValue)) {
+        properties.setProperty(getKeyFor(column, MIN_VALUE), minValue);
+      }
+      if (isValidPropertyValue(maxValue)) {
+        properties.setProperty(getKeyFor(column, MAX_VALUE), maxValue);
+      }
+      if (isValidPropertyValue(defaultNullValue)) {
+        properties.setProperty(getKeyFor(column, DEFAULT_NULL_VALUE), defaultNullValue);
+      }
     } else {
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DEFAULT_NULL_VALUE),
-          String.valueOf(defaultNullValue));
+      properties.setProperty(getKeyFor(column, MIN_VALUE), minValue);
+      properties.setProperty(getKeyFor(column, MAX_VALUE), maxValue);
+      properties.setProperty(getKeyFor(column, DEFAULT_NULL_VALUE), defaultNullValue);
     }
   }
 
   public static void addColumnMinMaxValueInfo(PropertiesConfiguration properties, String column, String minValue,
       String maxValue) {
-    properties.setProperty(getKeyFor(column, MIN_VALUE), minValue);
-    properties.setProperty(getKeyFor(column, MAX_VALUE), maxValue);
+    // Check special characters for STRING column
+    if (isValidPropertyValue(minValue)) {
+      properties.setProperty(getKeyFor(column, MIN_VALUE), minValue);
+    }
+    if (isValidPropertyValue(maxValue)) {
+      properties.setProperty(getKeyFor(column, MAX_VALUE), maxValue);
+    }
+  }
+
+  /**
+   * Helper method to check whether the given value is a valid property value.
+   * <p>Value is invalid iff:
+   * <ul>

Review comment:
       In `SegmentColumnarIndexCreatorTest.testPropertyValueWithSpecialCharacters()`, generating random strings to verify that the same string can be retrieved




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on pull request #5709: Store column min/max value into segment metadata

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on pull request #5709:
URL: https://github.com/apache/incubator-pinot/pull/5709#issuecomment-659734915


   @mayankshriv @siddharthteotia @mcvsubbu Addressed the comments, please take another look


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5709: Store column min/max value into segment metadata

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #5709:
URL: https://github.com/apache/incubator-pinot/pull/5709#discussion_r456158890



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -509,71 +510,83 @@ public static void addColumnMetadataInfo(PropertiesConfiguration properties, Str
     properties.setProperty(getKeyFor(column, HAS_NULL_VALUE), String.valueOf(columnIndexCreationInfo.hasNulls()));
     properties.setProperty(getKeyFor(column, HAS_DICTIONARY), String.valueOf(hasDictionary));
     properties.setProperty(getKeyFor(column, TEXT_INDEX_TYPE), textIndexType.name());
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, HAS_INVERTED_INDEX),
-        String.valueOf(hasInvertedIndex));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, IS_SINGLE_VALUED),
-        String.valueOf(fieldSpec.isSingleValueField()));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, MAX_MULTI_VALUE_ELEMTS),
+    properties.setProperty(getKeyFor(column, HAS_INVERTED_INDEX), String.valueOf(hasInvertedIndex));
+    properties.setProperty(getKeyFor(column, IS_SINGLE_VALUED), String.valueOf(fieldSpec.isSingleValueField()));
+    properties.setProperty(getKeyFor(column, MAX_MULTI_VALUE_ELEMTS),
         String.valueOf(columnIndexCreationInfo.getMaxNumberOfMultiValueElements()));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, TOTAL_NUMBER_OF_ENTRIES),
+    properties.setProperty(getKeyFor(column, TOTAL_NUMBER_OF_ENTRIES),
         String.valueOf(columnIndexCreationInfo.getTotalNumberOfEntries()));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, IS_AUTO_GENERATED),
-        String.valueOf(columnIndexCreationInfo.isAutoGenerated()));
+    properties
+        .setProperty(getKeyFor(column, IS_AUTO_GENERATED), String.valueOf(columnIndexCreationInfo.isAutoGenerated()));
 
     PartitionFunction partitionFunction = columnIndexCreationInfo.getPartitionFunction();
     if (partitionFunction != null) {
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, PARTITION_FUNCTION),
-          partitionFunction.toString());
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, NUM_PARTITIONS),
-          columnIndexCreationInfo.getNumPartitions());
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, PARTITION_VALUES),
-          columnIndexCreationInfo.getPartitions());
+      properties.setProperty(getKeyFor(column, PARTITION_FUNCTION), partitionFunction.toString());
+      properties.setProperty(getKeyFor(column, NUM_PARTITIONS), columnIndexCreationInfo.getNumPartitions());
+      properties.setProperty(getKeyFor(column, PARTITION_VALUES), columnIndexCreationInfo.getPartitions());
     }
 
     // datetime field
     if (fieldSpec.getFieldType().equals(FieldType.DATE_TIME)) {
       DateTimeFieldSpec dateTimeFieldSpec = (DateTimeFieldSpec) fieldSpec;
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DATETIME_FORMAT),
-          dateTimeFieldSpec.getFormat());
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DATETIME_GRANULARITY),
-          dateTimeFieldSpec.getGranularity());
+      properties.setProperty(getKeyFor(column, DATETIME_FORMAT), dateTimeFieldSpec.getFormat());
+      properties.setProperty(getKeyFor(column, DATETIME_GRANULARITY), dateTimeFieldSpec.getGranularity());
     }
 
-    Object defaultNullValue = columnIndexCreationInfo.getDefaultNullValue();
-    if (defaultNullValue instanceof byte[]) {
-      String defaultNullValueString = BytesUtils.toHexString((byte[]) defaultNullValue);
-      properties
-          .setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DEFAULT_NULL_VALUE), defaultNullValueString);
+    String minValue = columnIndexCreationInfo.getMin().toString();
+    String maxValue = columnIndexCreationInfo.getMax().toString();
+    String defaultNullValue = columnIndexCreationInfo.getDefaultNullValue().toString();
+    if (dataType == DataType.STRING) {
+      // Check special characters for STRING column
+      if (isValidPropertyValue(minValue)) {
+        properties.setProperty(getKeyFor(column, MIN_VALUE), minValue);
+      }
+      if (isValidPropertyValue(maxValue)) {
+        properties.setProperty(getKeyFor(column, MAX_VALUE), maxValue);
+      }
+      if (isValidPropertyValue(defaultNullValue)) {
+        properties.setProperty(getKeyFor(column, DEFAULT_NULL_VALUE), defaultNullValue);
+      }
     } else {
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DEFAULT_NULL_VALUE),
-          String.valueOf(defaultNullValue));
+      properties.setProperty(getKeyFor(column, MIN_VALUE), minValue);
+      properties.setProperty(getKeyFor(column, MAX_VALUE), maxValue);
+      properties.setProperty(getKeyFor(column, DEFAULT_NULL_VALUE), defaultNullValue);
     }
   }
 
   public static void addColumnMinMaxValueInfo(PropertiesConfiguration properties, String column, String minValue,
       String maxValue) {
-    properties.setProperty(getKeyFor(column, MIN_VALUE), minValue);
-    properties.setProperty(getKeyFor(column, MAX_VALUE), maxValue);
+    // Check special characters for STRING column
+    if (isValidPropertyValue(minValue)) {
+      properties.setProperty(getKeyFor(column, MIN_VALUE), minValue);
+    }
+    if (isValidPropertyValue(maxValue)) {
+      properties.setProperty(getKeyFor(column, MAX_VALUE), maxValue);
+    }
+  }
+
+  /**
+   * Helper method to check whether the given value is a valid property value.
+   * <p>Value is invalid iff:
+   * <ul>

Review comment:
       Not sure how we find out that these are the only comprehensive rules




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #5709: Store column min/max value into segment metadata

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #5709:
URL: https://github.com/apache/incubator-pinot/pull/5709#discussion_r455963749



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/SegmentColumnarIndexCreator.java
##########
@@ -509,71 +509,61 @@ public static void addColumnMetadataInfo(PropertiesConfiguration properties, Str
     properties.setProperty(getKeyFor(column, HAS_NULL_VALUE), String.valueOf(columnIndexCreationInfo.hasNulls()));
     properties.setProperty(getKeyFor(column, HAS_DICTIONARY), String.valueOf(hasDictionary));
     properties.setProperty(getKeyFor(column, TEXT_INDEX_TYPE), textIndexType.name());
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, HAS_INVERTED_INDEX),
-        String.valueOf(hasInvertedIndex));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, IS_SINGLE_VALUED),
-        String.valueOf(fieldSpec.isSingleValueField()));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, MAX_MULTI_VALUE_ELEMTS),
+    properties.setProperty(getKeyFor(column, HAS_INVERTED_INDEX), String.valueOf(hasInvertedIndex));
+    properties.setProperty(getKeyFor(column, IS_SINGLE_VALUED), String.valueOf(fieldSpec.isSingleValueField()));
+    properties.setProperty(getKeyFor(column, MAX_MULTI_VALUE_ELEMTS),
         String.valueOf(columnIndexCreationInfo.getMaxNumberOfMultiValueElements()));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, TOTAL_NUMBER_OF_ENTRIES),
+    properties.setProperty(getKeyFor(column, TOTAL_NUMBER_OF_ENTRIES),
         String.valueOf(columnIndexCreationInfo.getTotalNumberOfEntries()));
-    properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, IS_AUTO_GENERATED),
-        String.valueOf(columnIndexCreationInfo.isAutoGenerated()));
+    properties
+        .setProperty(getKeyFor(column, IS_AUTO_GENERATED), String.valueOf(columnIndexCreationInfo.isAutoGenerated()));
 
     PartitionFunction partitionFunction = columnIndexCreationInfo.getPartitionFunction();
     if (partitionFunction != null) {
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, PARTITION_FUNCTION),
-          partitionFunction.toString());
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, NUM_PARTITIONS),
-          columnIndexCreationInfo.getNumPartitions());
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, PARTITION_VALUES),
-          columnIndexCreationInfo.getPartitions());
+      properties.setProperty(getKeyFor(column, PARTITION_FUNCTION), partitionFunction.toString());
+      properties.setProperty(getKeyFor(column, NUM_PARTITIONS), columnIndexCreationInfo.getNumPartitions());
+      properties.setProperty(getKeyFor(column, PARTITION_VALUES), columnIndexCreationInfo.getPartitions());
     }
 
     // datetime field
     if (fieldSpec.getFieldType().equals(FieldType.DATE_TIME)) {
       DateTimeFieldSpec dateTimeFieldSpec = (DateTimeFieldSpec) fieldSpec;
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DATETIME_FORMAT),
-          dateTimeFieldSpec.getFormat());
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DATETIME_GRANULARITY),
-          dateTimeFieldSpec.getGranularity());
+      properties.setProperty(getKeyFor(column, DATETIME_FORMAT), dateTimeFieldSpec.getFormat());
+      properties.setProperty(getKeyFor(column, DATETIME_GRANULARITY), dateTimeFieldSpec.getGranularity());
     }
 
-    Object defaultNullValue = columnIndexCreationInfo.getDefaultNullValue();
-    if (defaultNullValue instanceof byte[]) {
-      String defaultNullValueString = BytesUtils.toHexString((byte[]) defaultNullValue);
-      properties
-          .setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DEFAULT_NULL_VALUE), defaultNullValueString);
+    String minValue = columnIndexCreationInfo.getMin().toString();
+    String maxValue = columnIndexCreationInfo.getMax().toString();
+    String defaultNullValue = columnIndexCreationInfo.getDefaultNullValue().toString();
+    if (dataType == DataType.STRING) {
+      // Escape special character for STRING column
+      properties.setProperty(getKeyFor(column, MIN_VALUE), escapeSpecialCharacter(minValue));
+      properties.setProperty(getKeyFor(column, MAX_VALUE), escapeSpecialCharacter(maxValue));
+      properties.setProperty(getKeyFor(column, DEFAULT_NULL_VALUE), escapeSpecialCharacter(defaultNullValue));
     } else {
-      properties.setProperty(V1Constants.MetadataKeys.Column.getKeyFor(column, DEFAULT_NULL_VALUE),
-          String.valueOf(defaultNullValue));
+      properties.setProperty(getKeyFor(column, MIN_VALUE), minValue);
+      properties.setProperty(getKeyFor(column, MAX_VALUE), maxValue);
+      properties.setProperty(getKeyFor(column, DEFAULT_NULL_VALUE), defaultNullValue);
     }
   }
 
   public static void addColumnMinMaxValueInfo(PropertiesConfiguration properties, String column, String minValue,
       String maxValue) {
-    properties.setProperty(getKeyFor(column, MIN_VALUE), minValue);
-    properties.setProperty(getKeyFor(column, MAX_VALUE), maxValue);
+    // Escape special character for STRING column
+    properties.setProperty(getKeyFor(column, MIN_VALUE), escapeSpecialCharacter(minValue));
+    properties.setProperty(getKeyFor(column, MAX_VALUE), escapeSpecialCharacter(maxValue));
+  }
+
+  /**
+   * Helper method to escape special character for the property value.
+   */
+  private static String escapeSpecialCharacter(String value) {

Review comment:
       Why are we escaping commas?
   
   Also, if we escape commas, don't we need to unescape them when reading the metadata back?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org