You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "yuemeng (Jira)" <ji...@apache.org> on 2022/07/14 09:24:00 UTC

[jira] [Created] (HUDI-4397) Flink Inline Cluster and Compact plan distribute strategy changed from rebalance to hash to avoid potential multiple threads accessing the same file

yuemeng created HUDI-4397:
-----------------------------

             Summary: Flink Inline Cluster and Compact plan distribute strategy changed from rebalance to hash to avoid potential multiple threads accessing the same file
                 Key: HUDI-4397
                 URL: https://issues.apache.org/jira/browse/HUDI-4397
             Project: Apache Hudi
          Issue Type: Improvement
            Reporter: yuemeng


Currently.  Flink Inline cluster and compact distribute strategy are rebalanced.

when a compact operation doesn't succeed then it rolls back and executes again

rebalance may lead to the same file will send to a different thread so accessed by multiple threads. such as failed rollback thread and normal compact thread.  cause follow error:

```

writing record  HoodieRecord\{key=HoodieKey { recordKey=a:100 partitionPath=2022-06-30/18}, currentLocation='null', newLocation='null'}
java.io.IOException: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: COLUMN
        at org.apache.hudi.org.apache.parquet.hadoop.ParquetFileWriter$STATE.error(ParquetFileWriter.java:217) ~[?:?]
        at org.apache.hudi.org.apache.parquet.hadoop.ParquetFileWriter$STATE.startBlock(ParquetFileWriter.java:209) ~[?:?]
        at org.apache.hudi.org.apache.parquet.hadoop.ParquetFileWriter.startBlock(ParquetFileWriter.java:407) ~[?:?]
        at org.apache.hudi.org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:184) ~[?:?]

```

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)