You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by sn...@apache.org on 2023/11/24 02:58:21 UTC

(pinot) branch master updated (b3af476b83 -> f79b618141)

This is an automated email from the ASF dual-hosted git repository.

snlee pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/pinot.git


    from b3af476b83 Prevent inverted index on a non dictionary column in table config (#12043)
     new 95d4950dab Added new SpecialValueTransformer and tests.
     new 5c8f43ead1 fix typo in comment.
     new 2c05f8dca9 addressed comments and added test for ensuring order of transformers.
     new 06e91c13a1 added code to remove NaN from multivalued columns and modified order of transformers.
     new 05be5cfd9a modifed test to ensure order of transformers.
     new f79b618141 fixed note in SpecialValueTransformer.

The 6 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../recordtransformer/CompositeTransformer.java    |  12 +-
 .../recordtransformer/SpecialValueTransformer.java | 113 ++++++++++++++++++
 .../recordtransformer/RecordTransformerTest.java   | 127 ++++++++++++++++++++-
 3 files changed, 248 insertions(+), 4 deletions(-)
 create mode 100644 pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SpecialValueTransformer.java


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


(pinot) 02/06: fix typo in comment.

Posted by sn...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

snlee pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pinot.git

commit 5c8f43ead1912b4f78a3ec91630fbeb0073698ab
Author: Aishik <ai...@startree.ai>
AuthorDate: Mon Nov 20 19:45:01 2023 +0530

    fix typo in comment.
---
 .../pinot/segment/local/recordtransformer/SpecialValueTransformer.java  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SpecialValueTransformer.java b/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SpecialValueTransformer.java
index 2bd5b9920d..9e55c84cf1 100644
--- a/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SpecialValueTransformer.java
+++ b/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SpecialValueTransformer.java
@@ -26,7 +26,7 @@ import org.apache.pinot.spi.data.readers.GenericRow;
 
 
 /**
- * The {@code SpecialValueTransformer} class will transform special values the values to follow certain rules including:
+ * The {@code SpecialValueTransformer} class will transform special values according to the following rules:
  * <ul>
  *   <li>Negative zero (-0.0) should be converted to 0.0</li>
  *   <li>NaN should be converted to default null</li>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


(pinot) 04/06: added code to remove NaN from multivalued columns and modified order of transformers.

Posted by sn...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

snlee pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pinot.git

commit 06e91c13a11c5a07939adf326de28e95e39f4596
Author: Aishik <ai...@startree.ai>
AuthorDate: Thu Nov 23 17:56:03 2023 +0530

    added code to remove NaN from multivalued columns and modified order of transformers.
---
 .../recordtransformer/CompositeTransformer.java    |  7 ++++---
 .../recordtransformer/SpecialValueTransformer.java | 18 ++++++++++------
 .../recordtransformer/RecordTransformerTest.java   | 24 +++++++++++++---------
 3 files changed, 30 insertions(+), 19 deletions(-)

diff --git a/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/CompositeTransformer.java b/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/CompositeTransformer.java
index 50ca2a97c1..0ba394e9c9 100644
--- a/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/CompositeTransformer.java
+++ b/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/CompositeTransformer.java
@@ -70,15 +70,16 @@ public class CompositeTransformer implements RecordTransformer {
    *   </li>
    *   <li>
    *     {@link SpecialValueTransformer} after {@link DataTypeTransformer} so that we already have the values complying
-   *      with the schema before handling special values
+   *      with the schema before handling special values and before {@link NullValueTransformer} so that it transforms
+   *      all the null values properly
    *   </li>
    * </ul>
    */
   public static List<RecordTransformer> getDefaultTransformers(TableConfig tableConfig, Schema schema) {
     return Stream.of(new ExpressionTransformer(tableConfig, schema), new FilterTransformer(tableConfig),
             new SchemaConformingTransformer(tableConfig, schema), new DataTypeTransformer(tableConfig, schema),
-            new TimeValidationTransformer(tableConfig, schema), new NullValueTransformer(tableConfig, schema),
-            new SpecialValueTransformer(schema), new SanitizationTransformer(schema)).filter(t -> !t.isNoOp())
+            new TimeValidationTransformer(tableConfig, schema), new SpecialValueTransformer(schema),
+            new NullValueTransformer(tableConfig, schema), new SanitizationTransformer(schema)).filter(t -> !t.isNoOp())
         .collect(Collectors.toList());
   }
 
diff --git a/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SpecialValueTransformer.java b/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SpecialValueTransformer.java
index 4cfbbc8ce9..26df94933a 100644
--- a/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SpecialValueTransformer.java
+++ b/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SpecialValueTransformer.java
@@ -18,7 +18,9 @@
  */
 package org.apache.pinot.segment.local.recordtransformer;
 
+import java.util.ArrayList;
 import java.util.HashSet;
+import java.util.List;
 import org.apache.pinot.spi.data.FieldSpec;
 import org.apache.pinot.spi.data.FieldSpec.DataType;
 import org.apache.pinot.spi.data.Schema;
@@ -65,10 +67,10 @@ public class SpecialValueTransformer implements RecordTransformer {
   private Object transformNaN(Object value) {
     if ((value instanceof Float) && ((Float) value).isNaN()) {
       LOGGER.info("Float.NaN detected, converting to default null.");
-      value = FieldSpec.DEFAULT_DIMENSION_NULL_VALUE_OF_FLOAT;
+      value = null;
     } else if ((value instanceof Double) && ((Double) value).isNaN()) {
       LOGGER.info("Double.NaN detected, converting to default null.");
-      value = FieldSpec.DEFAULT_DIMENSION_NULL_VALUE_OF_DOUBLE;
+      value = null;
     }
     return value;
   }
@@ -86,12 +88,16 @@ public class SpecialValueTransformer implements RecordTransformer {
         // Multi-valued column.
         Object[] values = (Object[]) value;
         int numValues = values.length;
-        for (int i = 0; i < numValues; i++) {
-          if (values[i] != null) {
-            values[i] = transformNegativeZero(values[i]);
-            values[i] = transformNaN(values[i]);
+        List<Object> negativeZeroNanSanitizedValues = new ArrayList<>(numValues);
+        int numberOfElements = values.length;
+        for (Object o : values) {
+          Object zeroTransformedValue = transformNegativeZero(o);
+          Object nanTransformedValue = transformNaN(zeroTransformedValue);
+          if (nanTransformedValue != null) {
+            negativeZeroNanSanitizedValues.add(nanTransformedValue);
           }
         }
+        record.putValue(element,negativeZeroNanSanitizedValues.toArray());
       } else {
         // Single-valued column.
         Object zeroTransformedValue = transformNegativeZero(value);
diff --git a/pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/RecordTransformerTest.java b/pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/RecordTransformerTest.java
index 6361963785..2685f8b6da 100644
--- a/pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/RecordTransformerTest.java
+++ b/pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/RecordTransformerTest.java
@@ -19,6 +19,8 @@
 package org.apache.pinot.segment.local.recordtransformer;
 
 import java.sql.Timestamp;
+import java.util.ArrayList;
+import java.util.Arrays;
 import java.util.Collections;
 import java.util.List;
 import java.util.concurrent.TimeUnit;
@@ -286,12 +288,12 @@ public class RecordTransformerTest {
           Double.doubleToRawLongBits(0.0d));
       assertEquals(record.getValue("mvFloatNegativeZero"), new Float[]{0.0f, 1.0f, 0.0f, 3.0f});
       assertEquals(record.getValue("mvDoubleNegativeZero"), new Double[]{0.0d, 1.0d, 0.0d, 3.0d});
-      assertEquals(record.getValue("svFloatNaN"), FieldSpec.DEFAULT_DIMENSION_NULL_VALUE_OF_FLOAT);
-      assertEquals(record.getValue("svDoubleNaN"), FieldSpec.DEFAULT_DIMENSION_NULL_VALUE_OF_DOUBLE);
+      assertNull(record.getValue("svFloatNaN"));
+      assertNull(record.getValue("svDoubleNaN"));
       assertEquals(record.getValue("mvFloatNaN"),
-          new Float[]{0.0f, FieldSpec.DEFAULT_DIMENSION_NULL_VALUE_OF_FLOAT, 2.0f});
+          new Float[]{0.0f, 2.0f});
       assertEquals(record.getValue("mvDoubleNaN"),
-          new Double[]{0.0d, FieldSpec.DEFAULT_DIMENSION_NULL_VALUE_OF_DOUBLE, 2.0d});
+          new Double[]{0.0d, 2.0d});
     }
   }
 
@@ -302,7 +304,8 @@ public class RecordTransformerTest {
     // Build Schema and ingestionConfig in such a way that all the transformers are loaded.
     Schema schema = new Schema.SchemaBuilder().addSingleValueDimension("svInt", DataType.INT)
         .addSingleValueDimension("svDouble", DataType.DOUBLE)
-        .addSingleValueDimension("expressionTestColumn", DataType.INT).addSingleValueDimension("svNaN", DataType.FLOAT)
+        .addSingleValueDimension("expressionTestColumn", DataType.INT)
+        .addSingleValueDimension("svNaN", DataType.FLOAT).addMultiValueDimension("mvNaN",DataType.FLOAT)
         .addSingleValueDimension("emptyDimensionForNullValueTransformer", DataType.FLOAT)
         .addSingleValueDimension("svStringWithNullCharacters", DataType.STRING)
         .addSingleValueDimension("indexableExtras", DataType.JSON)
@@ -327,8 +330,8 @@ public class RecordTransformerTest {
     List<RecordTransformer> expectedListOfTransformers =
         List.of(new ExpressionTransformer(tableConfig, schema), new FilterTransformer(tableConfig),
             new SchemaConformingTransformer(tableConfig, schema), new DataTypeTransformer(tableConfig, schema),
-            new TimeValidationTransformer(tableConfig, schema), new NullValueTransformer(tableConfig, schema),
-            new SpecialValueTransformer(schema), new SanitizationTransformer(schema));
+            new TimeValidationTransformer(tableConfig, schema), new SpecialValueTransformer(schema),
+            new NullValueTransformer(tableConfig, schema), new SanitizationTransformer(schema));
 
     // Check that the number of current transformers match the expected number of transformers.
     assertEquals(currentListOfTransformers.size(), NUMBER_OF_TRANSFORMERS);
@@ -349,6 +352,7 @@ public class RecordTransformerTest {
 
     // Data for SpecialValue Transformer.
     record.putValue("svNaN", Float.NaN);
+    record.putValue("mvNaN",new Float[]{1.0f,Float.NaN,2.0f});
 
     // Data for sanitization transformer.
     record.putValue("svStringWithNullCharacters", "1\0002\0003");
@@ -645,10 +649,10 @@ public class RecordTransformerTest {
       assertEquals(record.getValue("svFloatNaN"), FieldSpec.DEFAULT_DIMENSION_NULL_VALUE_OF_FLOAT);
       assertEquals(record.getValue("svDoubleNaN"), FieldSpec.DEFAULT_DIMENSION_NULL_VALUE_OF_DOUBLE);
       assertEquals(record.getValue("mvFloatNaN"),
-          new Float[]{0.0f, FieldSpec.DEFAULT_DIMENSION_NULL_VALUE_OF_FLOAT, 2.0f});
+          new Float[]{0.0f, 2.0f});
       assertEquals(record.getValue("mvDoubleNaN"),
-          new Double[]{0.0d, FieldSpec.DEFAULT_DIMENSION_NULL_VALUE_OF_DOUBLE, 2.0d});
-      assertTrue(record.getNullValueFields().isEmpty());
+          new Double[]{0.0d, 2.0d});
+      assertEquals(new ArrayList<>(record.getNullValueFields()), new ArrayList<>(Arrays.asList("svFloatNaN","svDoubleNaN")));
     }
 
     // Test empty record


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


(pinot) 05/06: modifed test to ensure order of transformers.

Posted by sn...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

snlee pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pinot.git

commit 05be5cfd9ade65d2785a412f6b0692f7ba0c0eb6
Author: Aishik <ai...@startree.ai>
AuthorDate: Thu Nov 23 23:14:26 2023 +0530

    modifed test to ensure order of transformers.
---
 .../local/recordtransformer/SpecialValueTransformer.java |  2 +-
 .../local/recordtransformer/RecordTransformerTest.java   | 16 +++++++++-------
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SpecialValueTransformer.java b/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SpecialValueTransformer.java
index 26df94933a..3383803194 100644
--- a/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SpecialValueTransformer.java
+++ b/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SpecialValueTransformer.java
@@ -97,7 +97,7 @@ public class SpecialValueTransformer implements RecordTransformer {
             negativeZeroNanSanitizedValues.add(nanTransformedValue);
           }
         }
-        record.putValue(element,negativeZeroNanSanitizedValues.toArray());
+        record.putValue(element, negativeZeroNanSanitizedValues.toArray());
       } else {
         // Single-valued column.
         Object zeroTransformedValue = transformNegativeZero(value);
diff --git a/pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/RecordTransformerTest.java b/pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/RecordTransformerTest.java
index 2685f8b6da..410fa19564 100644
--- a/pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/RecordTransformerTest.java
+++ b/pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/RecordTransformerTest.java
@@ -305,9 +305,9 @@ public class RecordTransformerTest {
     Schema schema = new Schema.SchemaBuilder().addSingleValueDimension("svInt", DataType.INT)
         .addSingleValueDimension("svDouble", DataType.DOUBLE)
         .addSingleValueDimension("expressionTestColumn", DataType.INT)
-        .addSingleValueDimension("svNaN", DataType.FLOAT).addMultiValueDimension("mvNaN",DataType.FLOAT)
+        .addSingleValueDimension("svNaN", DataType.FLOAT).addMultiValueDimension("mvNaN", DataType.FLOAT)
         .addSingleValueDimension("emptyDimensionForNullValueTransformer", DataType.FLOAT)
-        .addSingleValueDimension("svStringWithNullCharacters", DataType.STRING)
+        .addSingleValueDimension("svStringNull", DataType.STRING)
         .addSingleValueDimension("indexableExtras", DataType.JSON)
         .addDateTime("timeCol", DataType.TIMESTAMP, "1:MILLISECONDS:TIMESTAMP", "1:MILLISECONDS").build();
 
@@ -326,7 +326,7 @@ public class RecordTransformerTest {
     List<RecordTransformer> currentListOfTransformers =
         CompositeTransformer.getDefaultTransformers(tableConfig, schema);
 
-    // Create a list of transformers to compare.
+    // Create a list of transformers in the original order to compare.
     List<RecordTransformer> expectedListOfTransformers =
         List.of(new ExpressionTransformer(tableConfig, schema), new FilterTransformer(tableConfig),
             new SchemaConformingTransformer(tableConfig, schema), new DataTypeTransformer(tableConfig, schema),
@@ -352,14 +352,15 @@ public class RecordTransformerTest {
 
     // Data for SpecialValue Transformer.
     record.putValue("svNaN", Float.NaN);
-    record.putValue("mvNaN",new Float[]{1.0f,Float.NaN,2.0f});
+    record.putValue("mvNaN", new Float[]{1.0f, Float.NaN, 2.0f});
 
     // Data for sanitization transformer.
-    record.putValue("svStringWithNullCharacters", "1\0002\0003");
+    record.putValue("svStringNull", null);
 
     for (int i = 0; i < NUMBER_OF_TRANSFORMERS; i++) {
+      GenericRow copyRecord = record.copy();
       GenericRow currentRecord = currentListOfTransformers.get(i).transform(record);
-      GenericRow expectedRecord = expectedListOfTransformers.get(i).transform(record);
+      GenericRow expectedRecord = expectedListOfTransformers.get(i).transform(copyRecord);
       assertEquals(currentRecord, expectedRecord);
       record = expectedRecord;
     }
@@ -652,7 +653,8 @@ public class RecordTransformerTest {
           new Float[]{0.0f, 2.0f});
       assertEquals(record.getValue("mvDoubleNaN"),
           new Double[]{0.0d, 2.0d});
-      assertEquals(new ArrayList<>(record.getNullValueFields()), new ArrayList<>(Arrays.asList("svFloatNaN","svDoubleNaN")));
+      assertEquals(new ArrayList<>(record.getNullValueFields()),
+          new ArrayList<>(Arrays.asList("svFloatNaN", "svDoubleNaN")));
     }
 
     // Test empty record


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


(pinot) 01/06: Added new SpecialValueTransformer and tests.

Posted by sn...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

snlee pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pinot.git

commit 95d4950dab166373ed9ec58555435852e5f2ffeb
Author: Aishik <ai...@startree.ai>
AuthorDate: Mon Nov 20 19:18:40 2023 +0530

    Added new SpecialValueTransformer and tests.
---
 .../recordtransformer/CompositeTransformer.java    | 11 ++-
 .../recordtransformer/SpecialValueTransformer.java | 96 ++++++++++++++++++++++
 .../recordtransformer/RecordTransformerTest.java   | 49 +++++++++++
 3 files changed, 153 insertions(+), 3 deletions(-)

diff --git a/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/CompositeTransformer.java b/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/CompositeTransformer.java
index b6fb694306..50ca2a97c1 100644
--- a/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/CompositeTransformer.java
+++ b/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/CompositeTransformer.java
@@ -68,13 +68,18 @@ public class CompositeTransformer implements RecordTransformer {
    *     Optional {@link SanitizationTransformer} after {@link NullValueTransformer} so that before sanitation, all
    *     values are non-null and follow the data types defined in the schema
    *   </li>
+   *   <li>
+   *     {@link SpecialValueTransformer} after {@link DataTypeTransformer} so that we already have the values complying
+   *      with the schema before handling special values
+   *   </li>
    * </ul>
    */
   public static List<RecordTransformer> getDefaultTransformers(TableConfig tableConfig, Schema schema) {
     return Stream.of(new ExpressionTransformer(tableConfig, schema), new FilterTransformer(tableConfig),
-        new SchemaConformingTransformer(tableConfig, schema), new DataTypeTransformer(tableConfig, schema),
-        new TimeValidationTransformer(tableConfig, schema), new NullValueTransformer(tableConfig, schema),
-        new SanitizationTransformer(schema)).filter(t -> !t.isNoOp()).collect(Collectors.toList());
+            new SchemaConformingTransformer(tableConfig, schema), new DataTypeTransformer(tableConfig, schema),
+            new TimeValidationTransformer(tableConfig, schema), new NullValueTransformer(tableConfig, schema),
+            new SpecialValueTransformer(schema), new SanitizationTransformer(schema)).filter(t -> !t.isNoOp())
+        .collect(Collectors.toList());
   }
 
   public static CompositeTransformer getDefaultTransformer(TableConfig tableConfig, Schema schema) {
diff --git a/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SpecialValueTransformer.java b/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SpecialValueTransformer.java
new file mode 100644
index 0000000000..2bd5b9920d
--- /dev/null
+++ b/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SpecialValueTransformer.java
@@ -0,0 +1,96 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.recordtransformer;
+
+import java.util.HashSet;
+import org.apache.pinot.spi.data.FieldSpec;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+import org.apache.pinot.spi.data.Schema;
+import org.apache.pinot.spi.data.readers.GenericRow;
+
+
+/**
+ * The {@code SpecialValueTransformer} class will transform special values the values to follow certain rules including:
+ * <ul>
+ *   <li>Negative zero (-0.0) should be converted to 0.0</li>
+ *   <li>NaN should be converted to default null</li>
+ * </ul>
+ * <p>NOTE: should put this after the {@link DataTypeTransformer} so that all values follow the data types in
+ * {@link FieldSpec}.
+ */
+public class SpecialValueTransformer implements RecordTransformer {
+  private final HashSet<String> _specialValuesKeySet = new HashSet<>();
+
+  public SpecialValueTransformer(Schema schema) {
+    for (FieldSpec fieldSpec : schema.getAllFieldSpecs()) {
+      if (!fieldSpec.isVirtualColumn() && (fieldSpec.getDataType() == DataType.FLOAT
+          || fieldSpec.getDataType() == DataType.DOUBLE)) {
+        _specialValuesKeySet.add(fieldSpec.getName());
+      }
+    }
+  }
+
+  private Object transformNegativeZero(Object value) {
+    if ((value instanceof Float) && (Float.floatToRawIntBits((float) value) == Float.floatToRawIntBits(-0.0f))) {
+      value = 0.0f;
+    } else if ((value instanceof Double) && (Double.doubleToLongBits((double) value) == Double.doubleToLongBits(
+        -0.0d))) {
+      value = 0.0d;
+    }
+    return value;
+  }
+
+  private Object transformNaN(Object value) {
+    if ((value instanceof Float) && ((Float) value).isNaN()) {
+      value = FieldSpec.DEFAULT_DIMENSION_NULL_VALUE_OF_FLOAT;
+    } else if ((value instanceof Double) && ((Double) value).isNaN()) {
+      value = FieldSpec.DEFAULT_DIMENSION_NULL_VALUE_OF_DOUBLE;
+    }
+    return value;
+  }
+
+  @Override
+  public boolean isNoOp() {
+    return _specialValuesKeySet.isEmpty();
+  }
+
+  @Override
+  public GenericRow transform(GenericRow record) {
+    for (String element : _specialValuesKeySet) {
+      Object value = record.getValue(element);
+      if (value instanceof Float || value instanceof Double) {
+        // Single-valued column.
+        Object zeroTransformedValue = transformNegativeZero(value);
+        Object nanTransformedValue = transformNaN(zeroTransformedValue);
+        if (nanTransformedValue != value) {
+          record.putValue(element, nanTransformedValue);
+        }
+      } else if (value instanceof Object[]) {
+        // Multi-valued column.
+        Object[] values = (Object[]) value;
+        int numValues = values.length;
+        for (int i = 0; i < numValues; i++) {
+          values[i] = transformNegativeZero(values[i]);
+          values[i] = transformNaN(values[i]);
+        }
+      }
+    }
+    return record;
+  }
+}
diff --git a/pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/RecordTransformerTest.java b/pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/RecordTransformerTest.java
index baded1b97a..e48b6605f5 100644
--- a/pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/RecordTransformerTest.java
+++ b/pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/RecordTransformerTest.java
@@ -51,6 +51,13 @@ public class RecordTransformerTest {
       .addSingleValueDimension("svStringWithNullCharacters", DataType.STRING)
       .addSingleValueDimension("svStringWithLengthLimit", DataType.STRING)
       .addMultiValueDimension("mvString1", DataType.STRING).addMultiValueDimension("mvString2", DataType.STRING)
+      // For negative zero and NaN conversions
+      .addSingleValueDimension("svFloatNegativeZero", DataType.FLOAT)
+      .addMultiValueDimension("mvFloatNegativeZero", DataType.FLOAT)
+      .addSingleValueDimension("svDoubleNegativeZero", DataType.DOUBLE)
+      .addMultiValueDimension("mvDoubleNegativeZero", DataType.DOUBLE)
+      .addSingleValueDimension("svFloatNaN", DataType.FLOAT).addMultiValueDimension("mvFloatNaN", DataType.FLOAT)
+      .addSingleValueDimension("svDoubleNaN", DataType.DOUBLE).addMultiValueDimension("mvDoubleNaN", DataType.DOUBLE)
       .build();
   private static final TableConfig TABLE_CONFIG =
       new TableConfigBuilder(TableType.OFFLINE).setTableName("testTable").build();
@@ -82,6 +89,14 @@ public class RecordTransformerTest {
     record.putValue("mvString1", new Object[]{"123", 123, 123L, 123f, 123.0});
     record.putValue("mvString2", new Object[]{123, 123L, 123f, 123.0, "123"});
     record.putValue("svNullString", null);
+    record.putValue("svFloatNegativeZero", -0.00f);
+    record.putValue("svDoubleNegativeZero", -0.00d);
+    record.putValue("mvFloatNegativeZero", new Float[]{-0.0f, 1.0f, 0.0f, 3.0f});
+    record.putValue("mvDoubleNegativeZero", new Double[]{-0.0d, 1.0d, 0.0d, 3.0d});
+    record.putValue("svFloatNaN", Float.NaN);
+    record.putValue("svDoubleNaN", Double.NaN);
+    record.putValue("mvFloatNaN", new Float[]{-0.0f, Float.NaN, 2.0f});
+    record.putValue("mvDoubleNaN", new Double[]{-0.0d, Double.NaN, 2.0d});
     return record;
   }
 
@@ -254,6 +269,28 @@ public class RecordTransformerTest {
     }
   }
 
+  @Test
+  public void testSpecialValueTransformer() {
+    RecordTransformer transformer = new SpecialValueTransformer(SCHEMA);
+    GenericRow record = getRecord();
+    for (int i = 0; i < NUM_ROUNDS; i++) {
+      record = transformer.transform(record);
+      assertNotNull(record);
+      assertEquals(Float.floatToRawIntBits((float) record.getValue("svFloatNegativeZero")),
+          Float.floatToRawIntBits(0.0f));
+      assertEquals(Double.doubleToRawLongBits((double) record.getValue("svDoubleNegativeZero")),
+          Double.doubleToRawLongBits(0.0d));
+      assertEquals(record.getValue("mvFloatNegativeZero"), new Float[]{0.0f, 1.0f, 0.0f, 3.0f});
+      assertEquals(record.getValue("mvDoubleNegativeZero"), new Double[]{0.0d, 1.0d, 0.0d, 3.0d});
+      assertEquals(record.getValue("svFloatNaN"), FieldSpec.DEFAULT_DIMENSION_NULL_VALUE_OF_FLOAT);
+      assertEquals(record.getValue("svDoubleNaN"), FieldSpec.DEFAULT_DIMENSION_NULL_VALUE_OF_DOUBLE);
+      assertEquals(record.getValue("mvFloatNaN"),
+          new Float[]{0.0f, FieldSpec.DEFAULT_DIMENSION_NULL_VALUE_OF_FLOAT, 2.0f});
+      assertEquals(record.getValue("mvDoubleNaN"),
+          new Double[]{0.0d, FieldSpec.DEFAULT_DIMENSION_NULL_VALUE_OF_DOUBLE, 2.0d});
+    }
+  }
+
   @Test
   public void testScalarOps() {
     IngestionConfig ingestionConfig = new IngestionConfig();
@@ -530,6 +567,18 @@ public class RecordTransformerTest {
       assertEquals(record.getValue("mvString2"), new Object[]{"123", "123", "123.0", "123.0", "123"});
       assertNull(record.getValue("$virtual"));
       assertTrue(record.getNullValueFields().isEmpty());
+      assertEquals(Float.floatToRawIntBits((float) record.getValue("svFloatNegativeZero")),
+          Float.floatToRawIntBits(0.0f));
+      assertEquals(Double.doubleToRawLongBits((double) record.getValue("svDoubleNegativeZero")),
+          Double.doubleToRawLongBits(0.0d));
+      assertEquals(record.getValue("mvFloatNegativeZero"), new Float[]{0.0f, 1.0f, 0.0f, 3.0f});
+      assertEquals(record.getValue("mvDoubleNegativeZero"), new Double[]{0.0d, 1.0d, 0.0d, 3.0d});
+      assertEquals(record.getValue("svFloatNaN"), FieldSpec.DEFAULT_DIMENSION_NULL_VALUE_OF_FLOAT);
+      assertEquals(record.getValue("svDoubleNaN"), FieldSpec.DEFAULT_DIMENSION_NULL_VALUE_OF_DOUBLE);
+      assertEquals(record.getValue("mvFloatNaN"),
+          new Float[]{0.0f, FieldSpec.DEFAULT_DIMENSION_NULL_VALUE_OF_FLOAT, 2.0f});
+      assertEquals(record.getValue("mvDoubleNaN"),
+          new Double[]{0.0d, FieldSpec.DEFAULT_DIMENSION_NULL_VALUE_OF_DOUBLE, 2.0d});
     }
 
     // Test empty record


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


(pinot) 03/06: addressed comments and added test for ensuring order of transformers.

Posted by sn...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

snlee pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pinot.git

commit 2c05f8dca9db402de904c044264d54cc76858238
Author: Aishik <ai...@startree.ai>
AuthorDate: Wed Nov 22 18:54:24 2023 +0530

    addressed comments and added test for ensuring order of transformers.
---
 .../recordtransformer/SpecialValueTransformer.java | 28 ++++++---
 .../recordtransformer/RecordTransformerTest.java   | 72 +++++++++++++++++++++-
 2 files changed, 90 insertions(+), 10 deletions(-)

diff --git a/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SpecialValueTransformer.java b/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SpecialValueTransformer.java
index 9e55c84cf1..4cfbbc8ce9 100644
--- a/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SpecialValueTransformer.java
+++ b/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SpecialValueTransformer.java
@@ -23,6 +23,8 @@ import org.apache.pinot.spi.data.FieldSpec;
 import org.apache.pinot.spi.data.FieldSpec.DataType;
 import org.apache.pinot.spi.data.Schema;
 import org.apache.pinot.spi.data.readers.GenericRow;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 
 /**
@@ -35,6 +37,8 @@ import org.apache.pinot.spi.data.readers.GenericRow;
  * {@link FieldSpec}.
  */
 public class SpecialValueTransformer implements RecordTransformer {
+
+  private static final Logger LOGGER = LoggerFactory.getLogger(NullValueTransformer.class);
   private final HashSet<String> _specialValuesKeySet = new HashSet<>();
 
   public SpecialValueTransformer(Schema schema) {
@@ -48,9 +52,11 @@ public class SpecialValueTransformer implements RecordTransformer {
 
   private Object transformNegativeZero(Object value) {
     if ((value instanceof Float) && (Float.floatToRawIntBits((float) value) == Float.floatToRawIntBits(-0.0f))) {
+      LOGGER.info("-0.0f value detected, converting to 0.0.");
       value = 0.0f;
     } else if ((value instanceof Double) && (Double.doubleToLongBits((double) value) == Double.doubleToLongBits(
         -0.0d))) {
+      LOGGER.info("-0.0d value detected, converting to 0.0.");
       value = 0.0d;
     }
     return value;
@@ -58,8 +64,10 @@ public class SpecialValueTransformer implements RecordTransformer {
 
   private Object transformNaN(Object value) {
     if ((value instanceof Float) && ((Float) value).isNaN()) {
+      LOGGER.info("Float.NaN detected, converting to default null.");
       value = FieldSpec.DEFAULT_DIMENSION_NULL_VALUE_OF_FLOAT;
     } else if ((value instanceof Double) && ((Double) value).isNaN()) {
+      LOGGER.info("Double.NaN detected, converting to default null.");
       value = FieldSpec.DEFAULT_DIMENSION_NULL_VALUE_OF_DOUBLE;
     }
     return value;
@@ -74,21 +82,23 @@ public class SpecialValueTransformer implements RecordTransformer {
   public GenericRow transform(GenericRow record) {
     for (String element : _specialValuesKeySet) {
       Object value = record.getValue(element);
-      if (value instanceof Float || value instanceof Double) {
+      if (value instanceof Object[]) {
+        // Multi-valued column.
+        Object[] values = (Object[]) value;
+        int numValues = values.length;
+        for (int i = 0; i < numValues; i++) {
+          if (values[i] != null) {
+            values[i] = transformNegativeZero(values[i]);
+            values[i] = transformNaN(values[i]);
+          }
+        }
+      } else {
         // Single-valued column.
         Object zeroTransformedValue = transformNegativeZero(value);
         Object nanTransformedValue = transformNaN(zeroTransformedValue);
         if (nanTransformedValue != value) {
           record.putValue(element, nanTransformedValue);
         }
-      } else if (value instanceof Object[]) {
-        // Multi-valued column.
-        Object[] values = (Object[]) value;
-        int numValues = values.length;
-        for (int i = 0; i < numValues; i++) {
-          values[i] = transformNegativeZero(values[i]);
-          values[i] = transformNaN(values[i]);
-        }
       }
     }
     return record;
diff --git a/pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/RecordTransformerTest.java b/pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/RecordTransformerTest.java
index e48b6605f5..6361963785 100644
--- a/pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/RecordTransformerTest.java
+++ b/pinot-segment-local/src/test/java/org/apache/pinot/segment/local/recordtransformer/RecordTransformerTest.java
@@ -20,11 +20,14 @@ package org.apache.pinot.segment.local.recordtransformer;
 
 import java.sql.Timestamp;
 import java.util.Collections;
+import java.util.List;
 import java.util.concurrent.TimeUnit;
 import org.apache.pinot.spi.config.table.TableConfig;
 import org.apache.pinot.spi.config.table.TableType;
 import org.apache.pinot.spi.config.table.ingestion.FilterConfig;
 import org.apache.pinot.spi.config.table.ingestion.IngestionConfig;
+import org.apache.pinot.spi.config.table.ingestion.SchemaConformingTransformerConfig;
+import org.apache.pinot.spi.config.table.ingestion.TransformConfig;
 import org.apache.pinot.spi.data.DateTimeFormatSpec;
 import org.apache.pinot.spi.data.DimensionFieldSpec;
 import org.apache.pinot.spi.data.FieldSpec;
@@ -69,6 +72,7 @@ public class RecordTransformerTest {
 
   // Transform multiple times should return the same result
   private static final int NUM_ROUNDS = 5;
+  private static final int NUMBER_OF_TRANSFORMERS = 8;
 
   private static GenericRow getRecord() {
     GenericRow record = new GenericRow();
@@ -291,6 +295,72 @@ public class RecordTransformerTest {
     }
   }
 
+  @Test
+  public void testOrderForTransformers() {
+    // This test checks that the specified order is maintained for different transformers.
+
+    // Build Schema and ingestionConfig in such a way that all the transformers are loaded.
+    Schema schema = new Schema.SchemaBuilder().addSingleValueDimension("svInt", DataType.INT)
+        .addSingleValueDimension("svDouble", DataType.DOUBLE)
+        .addSingleValueDimension("expressionTestColumn", DataType.INT).addSingleValueDimension("svNaN", DataType.FLOAT)
+        .addSingleValueDimension("emptyDimensionForNullValueTransformer", DataType.FLOAT)
+        .addSingleValueDimension("svStringWithNullCharacters", DataType.STRING)
+        .addSingleValueDimension("indexableExtras", DataType.JSON)
+        .addDateTime("timeCol", DataType.TIMESTAMP, "1:MILLISECONDS:TIMESTAMP", "1:MILLISECONDS").build();
+
+    IngestionConfig ingestionConfig = new IngestionConfig();
+    TableConfig tableConfig =
+        new TableConfigBuilder(TableType.OFFLINE).setTableName("testTable").setIngestionConfig(ingestionConfig)
+            .setTimeColumnName("timeCol").build();
+    ingestionConfig.setFilterConfig(new FilterConfig("svInt = 123 AND svDouble <= 200"));
+    ingestionConfig.setTransformConfigs(List.of(new TransformConfig("expressionTestColumn", "plus(x,10)")));
+    ingestionConfig.setSchemaConformingTransformerConfig(
+        new SchemaConformingTransformerConfig("indexableExtras", null, null, null));
+    ingestionConfig.setRowTimeValueCheck(true);
+    ingestionConfig.setContinueOnError(false);
+
+    // Get the list of transformers.
+    List<RecordTransformer> currentListOfTransformers =
+        CompositeTransformer.getDefaultTransformers(tableConfig, schema);
+
+    // Create a list of transformers to compare.
+    List<RecordTransformer> expectedListOfTransformers =
+        List.of(new ExpressionTransformer(tableConfig, schema), new FilterTransformer(tableConfig),
+            new SchemaConformingTransformer(tableConfig, schema), new DataTypeTransformer(tableConfig, schema),
+            new TimeValidationTransformer(tableConfig, schema), new NullValueTransformer(tableConfig, schema),
+            new SpecialValueTransformer(schema), new SanitizationTransformer(schema));
+
+    // Check that the number of current transformers match the expected number of transformers.
+    assertEquals(currentListOfTransformers.size(), NUMBER_OF_TRANSFORMERS);
+
+    GenericRow record = new GenericRow();
+
+    // Data for expression Transformer.
+    record.putValue("expressionTestColumn", 100);
+
+    // Data for filter transformer.
+    record.putValue("svDouble", 123d);
+
+    // Data for DataType Transformer.
+    record.putValue("svInt", (byte) 123);
+
+    // Data for TimeValidation transformer.
+    record.putValue("timeCol", System.currentTimeMillis());
+
+    // Data for SpecialValue Transformer.
+    record.putValue("svNaN", Float.NaN);
+
+    // Data for sanitization transformer.
+    record.putValue("svStringWithNullCharacters", "1\0002\0003");
+
+    for (int i = 0; i < NUMBER_OF_TRANSFORMERS; i++) {
+      GenericRow currentRecord = currentListOfTransformers.get(i).transform(record);
+      GenericRow expectedRecord = expectedListOfTransformers.get(i).transform(record);
+      assertEquals(currentRecord, expectedRecord);
+      record = expectedRecord;
+    }
+  }
+
   @Test
   public void testScalarOps() {
     IngestionConfig ingestionConfig = new IngestionConfig();
@@ -566,7 +636,6 @@ public class RecordTransformerTest {
       assertEquals(record.getValue("mvString1"), new Object[]{"123", "123", "123", "123.0", "123.0"});
       assertEquals(record.getValue("mvString2"), new Object[]{"123", "123", "123.0", "123.0", "123"});
       assertNull(record.getValue("$virtual"));
-      assertTrue(record.getNullValueFields().isEmpty());
       assertEquals(Float.floatToRawIntBits((float) record.getValue("svFloatNegativeZero")),
           Float.floatToRawIntBits(0.0f));
       assertEquals(Double.doubleToRawLongBits((double) record.getValue("svDoubleNegativeZero")),
@@ -579,6 +648,7 @@ public class RecordTransformerTest {
           new Float[]{0.0f, FieldSpec.DEFAULT_DIMENSION_NULL_VALUE_OF_FLOAT, 2.0f});
       assertEquals(record.getValue("mvDoubleNaN"),
           new Double[]{0.0d, FieldSpec.DEFAULT_DIMENSION_NULL_VALUE_OF_DOUBLE, 2.0d});
+      assertTrue(record.getNullValueFields().isEmpty());
     }
 
     // Test empty record


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


(pinot) 06/06: fixed note in SpecialValueTransformer.

Posted by sn...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

snlee pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pinot.git

commit f79b618141e07c140663791959d96956ad91d811
Author: Aishik <ai...@startree.ai>
AuthorDate: Thu Nov 23 23:33:52 2023 +0530

    fixed note in SpecialValueTransformer.
---
 .../segment/local/recordtransformer/SpecialValueTransformer.java     | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SpecialValueTransformer.java b/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SpecialValueTransformer.java
index 3383803194..29596e0935 100644
--- a/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SpecialValueTransformer.java
+++ b/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SpecialValueTransformer.java
@@ -35,8 +35,9 @@ import org.slf4j.LoggerFactory;
  *   <li>Negative zero (-0.0) should be converted to 0.0</li>
  *   <li>NaN should be converted to default null</li>
  * </ul>
- * <p>NOTE: should put this after the {@link DataTypeTransformer} so that all values follow the data types in
- * {@link FieldSpec}.
+ * <p>NOTE: should put this after the {@link DataTypeTransformer} so that we already have the values complying
+ * with the schema before handling special values and before {@link NullValueTransformer} so that it transforms
+ * all the null values properly.
  */
 public class SpecialValueTransformer implements RecordTransformer {
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org