You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2023/01/17 05:59:03 UTC

[GitHub] [hudi] shengchiqu commented on issue #7229: [SUPPORT] flink connector sink Update the partition value, the old data is still there

shengchiqu commented on issue #7229:
URL: https://github.com/apache/hudi/issues/7229#issuecomment-1384871989

   @yihua thanks.My problem is that if changelog.enabled is true, flink incremental streaming reads are fine, but if spark/hive/flink is used to read the hudi directory table offline(batch read), there will be duplicates, all updates exist, and there is no de-duplication. Is it mutually exclusive if you want to use both incremental stream read and batch read?
   
   changelog.enabled=true => flink incr streaming shows the correct cdc; batch read is duplicate
   ```shell
   +-------------+--------------------------------+--------------------------------+--------------------------------+--------------------------------+-------------------+--------------------------------+--------------------------------+-------------------------+
   |   C_CUSTKEY |                         C_NAME |                      C_ADDRESS |                    C_NATIONKEY |                        C_PHONE |         C_ACCTBAL |                   C_MKTSEGMENT |                      C_COMMENT |                      ts |
   +-------------+--------------------------------+--------------------------------+--------------------------------+--------------------------------+-------------------+--------------------------------+--------------------------------+-------------------------+
   |           1 |             Customer#000000001 |                              a |                              1 |                25-989-741-2988 |            711.56 |                       BUILDING | to the even, regular platel... | 2023-01-17 13:43:25.380 |
   |           1 |             Customer#000000001 |                              a |                             12 |                25-989-741-2988 |            711.56 |                       BUILDING | to the even, regular platel... | 2023-01-17 13:43:31.383 |
   |           1 |             Customer#000000001 |                              a |                              2 |                25-989-741-2988 |            711.56 |                       BUILDING | to the even, regular platel... | 2023-01-17 13:43:28.381 |
   |           1 |             Customer#000000001 |                              a |                              3 |                25-989-741-2988 |            711.56 |                       BUILDING | to the even, regular platel... | 2023-01-17 13:43:31.383 |
   +-------------+--------------------------------+--------------------------------+--------------------------------+--------------------------------+-------------------+--------------------------------+--------------------------------+-------------------------+
   ```
   
   changelog.enabled=false => flink incr streaming is error; batch read is no-deplicate and the data is accurate
   ```shell
   Caused by: java.lang.IllegalStateException: Not expected to see delete records in this log-scan mode. Check Job Config
   	at org.apache.hudi.common.table.log.HoodieUnMergedLogRecordScanner.processNextDeletedRecord(HoodieUnMergedLogRecordScanner.java:60)
   	at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
   	at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
   	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:473)
   	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:343)
   	... 10 more
   
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org