You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/07/29 07:52:31 UTC

[GitHub] [incubator-pinot] Jackie-Jiang opened a new pull request #5760: Enhance DataTypeTransformer to handle nested Map/List/Object[]

Jackie-Jiang opened a new pull request #5760:
URL: https://github.com/apache/incubator-pinot/pull/5760


   ## Description
   Enhance DataTypeTransformer to handle nested Collection/Map/Object[]
   
   - Empty `Collection/Map/Object[]` will be treated as `null`
   - Single-entry `Collection/Map/Object[]` will be treated as single-value (map key is ignored)
   - Multi-entry `Collection/Map/Object[]` will be treated as multi-value (map key is ignored)
   - Move `NullValueTransformer` after `DataTypeTransformer` to handle the `null` from empty `Collection/Map/Object[]`
   
   NOTE:
   Multi-entry `Collection/Map/Object[]` will no longer be accepted for single-value column (instead of using the first value because that will cause data loss)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #5760: Enhance DataTypeTransformer to handle nested Map/List/Object[]

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #5760:
URL: https://github.com/apache/incubator-pinot/pull/5760#discussion_r461247207



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/recordtransformer/DataTypeTransformer.java
##########
@@ -73,10 +80,15 @@ public GenericRow transform(GenericRow record) {
     for (Map.Entry<String, PinotDataType> entry : _dataTypes.entrySet()) {
       String column = entry.getKey();
       Object value = record.getValue(column);
-
-      // Convert List value to Object[]
-      if (value instanceof List) {
-        value = ((List) value).toArray();
+      if (value == null) {

Review comment:
       There is no need to have an extra check here since it's been covered in Line 89.

##########
File path: pinot-core/src/test/java/org/apache/pinot/core/data/recordtransformer/DataTypeTransformerTest.java
##########
@@ -0,0 +1,159 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.data.recordtransformer;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.testng.annotations.Test;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertEqualsNoOrder;
+import static org.testng.Assert.assertNull;
+import static org.testng.Assert.fail;
+
+
+public class DataTypeTransformerTest {
+  private static final String COLUMN = "testColumn";
+
+  @Test
+  public void testStandardize() {
+    // Tests for Map
+    Map<String, Object> map = Collections.emptyMap();
+    assertNull(DataTypeTransformer.standardize(COLUMN, map, true));
+    assertNull(DataTypeTransformer.standardize(COLUMN, map, false));
+    String expectedValue = "testValue";
+    map = Collections.singletonMap("testKey", expectedValue);
+    assertEquals(DataTypeTransformer.standardize(COLUMN, map, true), expectedValue);
+    assertEquals(DataTypeTransformer.standardize(COLUMN, map, false), expectedValue);
+    Object[] expectedValues = new Object[]{"testValue1", "testValue2"};
+    map = new HashMap<>();
+    map.put("testKey1", "testValue1");
+    map.put("testKey2", "testValue2");
+    try {
+      DataTypeTransformer.standardize(COLUMN, map, true);

Review comment:
       Same here

##########
File path: pinot-core/src/test/java/org/apache/pinot/core/data/recordtransformer/DataTypeTransformerTest.java
##########
@@ -0,0 +1,159 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.data.recordtransformer;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.testng.annotations.Test;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertEqualsNoOrder;
+import static org.testng.Assert.assertNull;
+import static org.testng.Assert.fail;
+
+
+public class DataTypeTransformerTest {
+  private static final String COLUMN = "testColumn";
+
+  @Test
+  public void testStandardize() {
+    // Tests for Map
+    Map<String, Object> map = Collections.emptyMap();
+    assertNull(DataTypeTransformer.standardize(COLUMN, map, true));
+    assertNull(DataTypeTransformer.standardize(COLUMN, map, false));
+    String expectedValue = "testValue";
+    map = Collections.singletonMap("testKey", expectedValue);
+    assertEquals(DataTypeTransformer.standardize(COLUMN, map, true), expectedValue);
+    assertEquals(DataTypeTransformer.standardize(COLUMN, map, false), expectedValue);
+    Object[] expectedValues = new Object[]{"testValue1", "testValue2"};
+    map = new HashMap<>();
+    map.put("testKey1", "testValue1");
+    map.put("testKey2", "testValue2");
+    try {
+      DataTypeTransformer.standardize(COLUMN, map, true);
+      fail();
+    } catch (Exception e) {
+      // Expected
+    }
+    assertEqualsNoOrder((Object[]) DataTypeTransformer.standardize(COLUMN, map, false), expectedValues);
+
+    // Tests for List
+    List<Object> list = Collections.emptyList();
+    assertNull(DataTypeTransformer.standardize(COLUMN, list, true));
+    assertNull(DataTypeTransformer.standardize(COLUMN, list, false));
+    list = Collections.singletonList(expectedValue);
+    assertEquals(DataTypeTransformer.standardize(COLUMN, list, true), expectedValue);
+    assertEquals(DataTypeTransformer.standardize(COLUMN, list, false), expectedValue);
+    list = Arrays.asList(expectedValues);
+    try {
+      DataTypeTransformer.standardize(COLUMN, list, true);

Review comment:
       Can you add a simple comment here on why it should fail (because of single value column)?

##########
File path: pinot-core/src/test/java/org/apache/pinot/core/data/recordtransformer/DataTypeTransformerTest.java
##########
@@ -0,0 +1,149 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.data.recordtransformer;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.testng.annotations.Test;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertEqualsNoOrder;
+import static org.testng.Assert.assertNull;
+import static org.testng.Assert.fail;
+
+
+public class DataTypeTransformerTest {
+  private static final String COLUMN = "testColumn";
+
+  @Test
+  public void testStandardize() {
+    // Tests for Map

Review comment:
       I think Line 138 in `DataTypeTransformerTest.java` is the example that we expect.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/recordtransformer/DataTypeTransformer.java
##########
@@ -104,4 +115,76 @@ public GenericRow transform(GenericRow record) {
     }
     return record;
   }
+
+  /**
+   * Standardize the value into supported types.

Review comment:
       Can we add more description on what this method does and what should be the expected behavior?

##########
File path: pinot-core/src/test/java/org/apache/pinot/core/data/recordtransformer/DataTypeTransformerTest.java
##########
@@ -0,0 +1,159 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.data.recordtransformer;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.testng.annotations.Test;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertEqualsNoOrder;
+import static org.testng.Assert.assertNull;
+import static org.testng.Assert.fail;
+
+
+public class DataTypeTransformerTest {

Review comment:
       It'd be good to include test which value can be converted to other type like `100`, `200` to make sure the output converted value should be the correct one.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5760: Enhance DataTypeTransformer to handle nested Map/List/Object[]

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5760:
URL: https://github.com/apache/incubator-pinot/pull/5760#discussion_r461237235



##########
File path: pinot-core/src/test/java/org/apache/pinot/core/data/recordtransformer/DataTypeTransformerTest.java
##########
@@ -0,0 +1,149 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.data.recordtransformer;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.testng.annotations.Test;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertEqualsNoOrder;
+import static org.testng.Assert.assertNull;
+import static org.testng.Assert.fail;
+
+
+public class DataTypeTransformerTest {
+  private static final String COLUMN = "testColumn";
+
+  @Test
+  public void testStandardize() {
+    // Tests for Map

Review comment:
       @siddharthteotia Added

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/recordtransformer/DataTypeTransformer.java
##########
@@ -104,4 +115,76 @@ public GenericRow transform(GenericRow record) {
     }
     return record;
   }
+
+  /**
+   * Standardize the value into supported types.

Review comment:
       Added

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/recordtransformer/DataTypeTransformer.java
##########
@@ -73,10 +80,15 @@ public GenericRow transform(GenericRow record) {
     for (Map.Entry<String, PinotDataType> entry : _dataTypes.entrySet()) {
       String column = entry.getKey();
       Object value = record.getValue(column);
-
-      // Convert List value to Object[]
-      if (value instanceof List) {
-        value = ((List) value).toArray();
+      if (value == null) {

Review comment:
       Perform this check before standardizing the value to prevent the extra `record.putValue()`

##########
File path: pinot-core/src/test/java/org/apache/pinot/core/data/recordtransformer/DataTypeTransformerTest.java
##########
@@ -0,0 +1,159 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.data.recordtransformer;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.testng.annotations.Test;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertEqualsNoOrder;
+import static org.testng.Assert.assertNull;
+import static org.testng.Assert.fail;
+
+
+public class DataTypeTransformerTest {

Review comment:
       That is covered in `PinotDataTypeTest`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang merged pull request #5760: Enhance DataTypeTransformer to handle nested Map/List/Object[]

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang merged pull request #5760:
URL: https://github.com/apache/incubator-pinot/pull/5760


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on pull request #5760: Enhance DataTypeTransformer to handle nested Map/List/Object[]

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on pull request #5760:
URL: https://github.com/apache/incubator-pinot/pull/5760#issuecomment-665204779


   @jackjlli Addressed all the comments


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5760: Enhance DataTypeTransformer to handle nested Map/List/Object[]

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #5760:
URL: https://github.com/apache/incubator-pinot/pull/5760#discussion_r461235521



##########
File path: pinot-core/src/test/java/org/apache/pinot/core/data/recordtransformer/DataTypeTransformerTest.java
##########
@@ -0,0 +1,149 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.data.recordtransformer;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.testng.annotations.Test;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertEqualsNoOrder;
+import static org.testng.Assert.assertNull;
+import static org.testng.Assert.fail;
+
+
+public class DataTypeTransformerTest {
+  private static final String COLUMN = "testColumn";
+
+  @Test
+  public void testStandardize() {
+    // Tests for Map

Review comment:
       Can we please add tests for the exact case where we were seeing the problem? The AvroRecorExtractor extracted the column as Object[] where each value in the array is a Map of single KV pair and this is what is passed to DataTypeTransformer.  The actual type of column in the schema is MV STRING and the transformer should just convert it into Object[] where each value is a String and is the corresponding value from each Map. 
   
   Looking at the code in standardize and standardizeCollection, it looks the above scenario is well taken care of but want to make sure that in future when this code is changed for either handling map probably or something else, we don't regress again. 
   
   @jackjlli , is there any other case that we were seeing when the problem was original reproduced. Good to share a sample of original data so that we can use it in tests here,

##########
File path: pinot-core/src/test/java/org/apache/pinot/core/data/recordtransformer/DataTypeTransformerTest.java
##########
@@ -0,0 +1,149 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.data.recordtransformer;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.testng.annotations.Test;
+
+import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertEqualsNoOrder;
+import static org.testng.Assert.assertNull;
+import static org.testng.Assert.fail;
+
+
+public class DataTypeTransformerTest {
+  private static final String COLUMN = "testColumn";
+
+  @Test
+  public void testStandardize() {
+    // Tests for Map

Review comment:
       Can we please add tests for the exact case where we were seeing the problem? The AvroRecorExtractor extracted the column as Object[] where each value in the array is a Map of single KV pair and this is what is passed to DataTypeTransformer.  The actual type of column in the schema is MV STRING and the transformer should just convert it into Object[] where each value is a String and is the corresponding value from each Map. 
   
   Looking at the code in standardize and standardizeCollection, it looks the above scenario is well taken care of but want to make sure that in future when this code is changed for either handling map properly or something else, we don't regress again. 
   
   @jackjlli , is there any other case that we were seeing when the problem was original reproduced. Good to share a sample of original data so that we can use it in tests here,




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org