You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/10/19 09:48:23 UTC

[GitHub] [flink] luoyuxia commented on a diff in pull request #20988: [FLINK-29547][table] Fix the bug of Select a[1] which is array type for parquet complex type throw ClassCastException

luoyuxia commented on code in PR #20988:
URL: https://github.com/apache/flink/pull/20988#discussion_r999205523


##########
flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/data/conversion/ArrayObjectArrayConverter.java:
##########
@@ -100,7 +100,6 @@ public E[] toExternal(ArrayData internal) {
             if (genericArray.isPrimitiveArray()) {
                 return genericToJavaArrayConverter.convert((GenericArrayData) internal);
             }
-            return (E[]) genericArray.toObjectArray();

Review Comment:
   I'm wondering whether it's a good idea to always remove this line and then fall back  to `toJavaArray(internal)`. 
   From my side, `(E[]) genericArray.toObjectArray()` seems a optimization compared to `toJavaArray(internal)`.
   In the vectorized way,  it should fall to `toJavaArray(internal)`, otherwise, it will fail.
   But in the non-vectorized way, everything is ok even though we don't remove this line.
   For example, if we try to make array read to be non-vectorized by modifying the method `isVectorizationUnsupported`. The test will pass if we don't remove this line.
   



##########
flink-connectors/flink-connector-hive/src/test/java/org/apache/flink/connectors/hive/HiveTableSourceITCase.java:
##########
@@ -190,6 +190,29 @@ public void testReadParquetComplexDataType() throws Exception {
         batchTableEnv.unloadModule("hive");
     }
 
+    @Test
+    public void testReadParquetArrayDataType() throws Exception {
+        batchTableEnv.executeSql(
+                "create table parquet_complex_type_test("
+                        + "a array<int>, m map<int,string>, s struct<f1:int,f2:bigint>) stored as parquet");
+        // load hive module so that we can use array,map, named_struct function
+        // for convenient writing complex data
+        batchTableEnv.loadModule("hive", new HiveModule());
+        batchTableEnv.useModules("hive", CoreModuleFactory.IDENTIFIER);
+
+        batchTableEnv
+                .executeSql(
+                        "insert into parquet_complex_type_test"
+                                + " select array(1, 2), map(1, 'val1', 2, 'val2'),"
+                                + " named_struct('f1', 1,  'f2', 2)")
+                .await();
+
+        Table src = batchTableEnv.sqlQuery("select a[1], a[3] from parquet_complex_type_test");

Review Comment:
   I think we can just move the test to `testReadParquetComplexDataType`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org