You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "gortiz (via GitHub)" <gi...@apache.org> on 2023/03/21 11:17:13 UTC

[GitHub] [pinot] gortiz commented on a diff in pull request #10380: [feature] [backward-incompat] [null support # 2] Preserve null literal information in literal context and literal transform

gortiz commented on code in PR #10380:
URL: https://github.com/apache/pinot/pull/10380#discussion_r1143222905


##########
pinot-common/src/main/java/org/apache/pinot/common/request/context/LiteralContext.java:
##########
@@ -37,68 +45,116 @@ public class LiteralContext {
   private FieldSpec.DataType _type;
   private Object _value;
 
-  // TODO: Support all data types.
-  private static FieldSpec.DataType convertThriftTypeToDataType(Literal._Fields fields) {
-    switch (fields) {
-      case LONG_VALUE:
-        return FieldSpec.DataType.LONG;
-      case BOOL_VALUE:
-        return FieldSpec.DataType.BOOLEAN;
-      case DOUBLE_VALUE:
-        return FieldSpec.DataType.DOUBLE;
-      case STRING_VALUE:
-        return FieldSpec.DataType.STRING;
+  private BigDecimal _bigDecimalValue;
+
+  private static BigDecimal getBigDecimalValue(FieldSpec.DataType type, Object value) {
+    switch (type){
+      case BIG_DECIMAL:
+        return (BigDecimal) value;
+      case BOOLEAN:
+        return PinotDataType.BOOLEAN.toBigDecimal(value);
+      case TIMESTAMP:
+        return PinotDataType.TIMESTAMP.toBigDecimal(Timestamp.valueOf(value.toString()));
       default:
-        throw new UnsupportedOperationException("Unsupported literal type:" + fields);
+        if(type.isNumeric()){
+          return new BigDecimal(value.toString());
+        }
+        return BigDecimal.ZERO;
     }
   }
 
-  private static Class<?> convertDataTypeToJavaType(FieldSpec.DataType dataType) {
-    switch (dataType) {
-      case INT:
-        return Integer.class;
-      case LONG:
-        return Long.class;
-      case BOOLEAN:
-        return Boolean.class;
-      case FLOAT:
-        return Float.class;
-      case DOUBLE:
-        return Double.class;
-      case STRING:
-        return String.class;
-      default:
-        throw new UnsupportedOperationException("Unsupported dataType:" + dataType);
+  @VisibleForTesting
+  static Pair<FieldSpec.DataType, Object> inferLiteralDataTypeAndValue(String literal) {
+    // Try to interpret the literal as number
+    try {
+      Number number = NumberUtils.createNumber(literal);
+      if  (number instanceof BigDecimal || number instanceof BigInteger) {
+        return ImmutablePair.of(FieldSpec.DataType.BIG_DECIMAL, new BigDecimal(literal));
+      } else {
+        return ImmutablePair.of(FieldSpec.DataType.STRING, literal);
+      }
+    } catch (Exception e) {
+      // Ignored
+    }
+
+    // Try to interpret the literal as TIMESTAMP
+    try {
+      Timestamp timestamp = Timestamp.valueOf(literal);
+      return ImmutablePair.of(FieldSpec.DataType.TIMESTAMP, timestamp);
+    } catch (Exception e) {
+      // Ignored
     }
+    return ImmutablePair.of(FieldSpec.DataType.STRING, literal);
   }
 
   public LiteralContext(Literal literal) {
-    _type = convertThriftTypeToDataType(literal.getSetField());
-    _value = literal.getFieldValue();
+    Preconditions.checkState(literal.getFieldValue() != null,
+        "Field value cannot be null for field:" + literal.getSetField());
+    switch (literal.getSetField()){
+      case BOOL_VALUE:
+        _type = FieldSpec.DataType.BOOLEAN;
+        _value = literal.getFieldValue();
+        break;
+      case DOUBLE_VALUE:
+        _type = FieldSpec.DataType.DOUBLE;
+        _value = literal.getFieldValue();
+        break;
+      case LONG_VALUE:
+        _type = FieldSpec.DataType.LONG;
+        _value = literal.getFieldValue();
+        break;
+      case NULL_VALUE:
+        _type = FieldSpec.DataType.UNKNOWN;
+        _value = null;
+        break;
+      case STRING_VALUE:
+        Pair<FieldSpec.DataType, Object> typeAndValue = inferLiteralDataTypeAndValue(literal.getFieldValue().toString());
+        _type = typeAndValue.getLeft();
+        _value = typeAndValue.getRight();
+        break;
+      default:
+        throw new UnsupportedOperationException("Unsupported data type:" + literal.getSetField());
+    }
+    _bigDecimalValue = getBigDecimalValue(_type, _value);

Review Comment:
   It is not clear to me what is the difference between `_value` and `_bigDecimalValue` in each case.
   
   * If literal.getSetField is `DOUBLE_VALUE` or `LONG_VALUE`, `_value` is either a `Double` or a `Long`.
   * If literal.getSetField is `STRING_VALUE`:
        * And the literal is an actual number (like `"123"` or `"1.3"`), then `_value` is the more specific implementation `Number` possible.
        * And the literal is not an actual number:
                * If it is a boolean, `_value` is a String like `"true"` or `"false"` and `_bigDecimanValue` is `1` or `0`.
                * If it is a timestamp, `_value` is a String (why?) like "21312312" and `_bigDecimanValue` is the representation as a BigDecimal.
   
   Then `_bigDecimalValue` is only used in `getXValue` where `X` is `Int`, `Double` or `BigDecimal`.
   
   For the context, `BigDecimal` instances are quite large. They contain several attributes and therefore they consume a significant amount of memory. I guess normal queries should not have tons of instances of LiteralContext, but we will receive degenerated queries like `where X in [list with hundreds/thousand of elements]` or `where X = L1 or X = L2 or ... X = L1000`. I had to deal with these cases in other databases and suddenly inefficiencies that looked innocent imply OOMs.
   
   Therefore I would strongly recommend to try to not use `BigDecimal` if not needed. In this case:
   - Most of the times we could try first to check whether _value is instance of Number, in which case we can just call `intValue` or `doubleValue`.
   - Substitute final `_bigDecimalValue` with mutable `_numberValue` (or final `AtomicReference`). This attribute will not be instantiated unless `getBigDecimalValue` is called or in case `getIntValue` or `getDoubleValue` is called in a literal of non numeric type.
   - Therefore `getXValue` should check whether the value is null and calculate the actual value if it is not.
   - When the value is calculated, ideally we should not use a `_bigDecimalValue` unless we are actually calling `getBigDecimalValue`, in which case we have no other option.
   
   By doing that we may reduce the amount of memory used in degenerated queries like the ones I listed above.
   
   I'm assuming these `getXValue` are going to be called in the hotpath. In case they don't I wouldn't even care trying to cache the value.
   - `getBigDecimalValue` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org