You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "kkrugler (via GitHub)" <gi...@apache.org> on 2023/03/08 22:45:57 UTC

[GitHub] [hudi] kkrugler opened a new issue, #8136: [SUPPORT] Wrong type returned by ParquetColumnarRowSplitReader in hudi-flink1.16.x code

kkrugler opened a new issue, #8136:
URL: https://github.com/apache/hudi/issues/8136

   **Describe the problem you faced**
   
   When using Flink to do an incremental query read from a table, using the Hudi 0.13.0 release and Flink 1.15, I get a `java.lang.NoSuchMethodError`.
   
   I believe the issue is that the new ParquetColumnarRowSplitReader added for Flink 1.16 is returning `ColumnarRowData`, but it should be returning `RowData`, the same as the other versions of this class (for Flink 1.13/1.14/1.15).
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. `git clone https://github.com/kkrugler/flink-hudi-query-test`
   2. Edit the `pom.xml` file to set `<hudi.version>0.13.0</hudi.version>`.
   3. Run `mvn clean package`
   
   The `ExampleWorkflowTest.testHudiAndIncrementalQuery` test will fail.
   
   **Expected behavior**
   
   The tests should all pass.
   
   **Environment Description**
   
   * Hudi version : 0.13.0
   
   * Flink version : 1.15.1
   
   **Stacktrace**
   
   ```
   java.lang.NoSuchMethodError: org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.nextRecord()Lorg/apache/flink/table/data/columnar/ColumnarRowData;
   	at org.apache.hudi.table.format.ParquetSplitRecordIterator.next(ParquetSplitRecordIterator.java:50)
   	at org.apache.hudi.table.format.ParquetSplitRecordIterator.next(ParquetSplitRecordIterator.java:32)
   	at org.apache.hudi.table.format.mor.MergeOnReadInputFormat.nextRecord(MergeOnReadInputFormat.java:271)
   	at org.apache.hudi.source.StreamReadOperator.consumeAsMiniBatch(StreamReadOperator.java:187)
   	at org.apache.hudi.source.StreamReadOperator.processSplits(StreamReadOperator.java:166)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kkrugler commented on issue #8136: [SUPPORT] Wrong type returned by ParquetColumnarRowSplitReader in hudi-flink1.16.x code

Posted by "kkrugler (via GitHub)" <gi...@apache.org>.
kkrugler commented on issue #8136:
URL: https://github.com/apache/hudi/issues/8136#issuecomment-1462510643

   Hi @BruceKellan,
   
   > Hi kkrugler, I have seen your project code.
   > 
   > Hudi-flink is not directly open to users.
   > 
   > You rely on hudi-flink in your code, so you indirectly depend on org.apache.hudi:hudi-flink1.16.x:jar and occur class conflict, you can check it by executing this command:
   > 
   > ```shell
   > mvn dependency:tree
   > ```
   
   I explicitly exclude `hudi-flink1.16.x` in the pom.xml. If you follow the steps to reproduce (above), I'm curious what you get with `mvn dependency:tree`, and mine doesn't show `hudi-flink1.16.x`.
    
   Also see https://github.com/apache/hudi/pull/7651/files, and the Jira issue it references, for more background on the root cause of the bug.
   
   > Since it is unrealistic and unfriendly for users to sort out the internal dependency structure, we provide bundle packages, and users only need to rely on the hudi-flink-bundle-jar. Such as [hudi-flink1.15-bundle](https://mvnrepository.com/artifact/org.apache.hudi/hudi-flink1.15-bundle/0.13.0).
   
   Unfortunately for workflows with significant dependencies, using the `hudi-flink-bundle-jar` is unreasonable as it pulls in many additional dependencies, which leads to version conflicts with multiple jars on the classpath.
   
   It would be better to have a bundle jar that fully shades all dependencies, though that would create a very, very large jar.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #8136: [SUPPORT] Wrong type returned by ParquetColumnarRowSplitReader in hudi-flink1.16.x code

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on issue #8136:
URL: https://github.com/apache/hudi/issues/8136#issuecomment-1461346106

   Sorry @kkrugler , could you fire a fix similar with before?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 closed issue #8136: [SUPPORT] Wrong type returned by ParquetColumnarRowSplitReader in hudi-flink1.16.x code

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 closed issue #8136: [SUPPORT] Wrong type returned by ParquetColumnarRowSplitReader in hudi-flink1.16.x code
URL: https://github.com/apache/hudi/issues/8136


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] BruceKellan commented on issue #8136: [SUPPORT] Wrong type returned by ParquetColumnarRowSplitReader in hudi-flink1.16.x code

Posted by "BruceKellan (via GitHub)" <gi...@apache.org>.
BruceKellan commented on issue #8136:
URL: https://github.com/apache/hudi/issues/8136#issuecomment-1461231411

   Hi kkrugler, I have seen your project code.
   
   Hudi-flink is not directly open to users.
   
   You rely on hudi-flink in your code, so you indirectly depend on org.apache.hudi:hudi-flink1.16.x:jar, you can check it by executing this command:
   
   ```shell
   mvn dependency:tree
   ```
   
   Since it is unrealistic and unfriendly for users to sort out the internal dependency structure, we provide bundle packages, and users only need to rely on the hudi-flink-bundle-jar.
   Such as [hudi-flink1.15-bundle](https://mvnrepository.com/artifact/org.apache.hudi/hudi-flink1.15-bundle/0.13.0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kkrugler commented on issue #8136: [SUPPORT] Wrong type returned by ParquetColumnarRowSplitReader in hudi-flink1.16.x code

Posted by "kkrugler (via GitHub)" <gi...@apache.org>.
kkrugler commented on issue #8136:
URL: https://github.com/apache/hudi/issues/8136#issuecomment-1462545592

   Hi @danny0405 - see https://github.com/apache/hudi/pull/8145


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org