You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by "Zuyeu, Anton" <zu...@amazon.com.INVALID> on 2020/06/19 04:57:02 UTC
IllegalStateException: Hudi File Id (...) has more than 1 pending
compactions. Hudi 0.5.3 + S3
Hi Team,
We are trying to run incremental updates to our MoR hudi table on S3 and it looks like inevitably after 20-30 commits table gets corrupted. We do initial data import and enable incremental upserts then we verify that tables are readable by running:
hive> select * from table_name _ro limit 1;
but after letting incremental upserts to run for several hours , the mentioned above select query starts throwing exceptions like:
Failed with exception java.io.IOException:java.lang.IllegalStateException: Hudi File Id (HoodieFileGroupId{partitionPath='983', fileId='8e9fde92-7515-4f89-a667-ce5c1087e60c-0'}) has more than 1 pending compactions.
Checking compactions mentioned in exception message via hudi-cli, do indeed verifies that fileid is present in both compactions. The upsert settings that we use are:
hudiOptions = Map[String,String](
HoodieWriteConfig.TABLE_NAME → inputTableName,
"hoodie.consistency.check.enabled"->"true",
"hoodie.compact.inline.max.delta.commits"->"30",
"hoodie.compact.inline"->"true",
"hoodie.clean.automatic"->"true",
"hoodie.cleaner.commits.retained"->"1000",
"hoodie.keep.min.commits"->"1001",
"hoodie.keep.max.commits"->"1050",
DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> "MERGE_ON_READ",
DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> primaryKeys,
DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY -> classOf[ComplexKeyGenerator].getName,
DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY ->"partition_val_str",
DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> sortKeys,
DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY → "true",
DataSourceWriteOptions.HIVE_TABLE_OPT_KEY → inputTableName,
DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY → "partition_val_str",
DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY → classOf[MultiPartKeysValueExtractor].getName,
DataSourceWriteOptions.HIVE_URL_OPT_KEY ->s"jdbc:hive2://$hiveServer2URI:10000"
)
Any suggestions on what can cause or how to possibly debug this issue would help a lot.
Thank you,
Anton Zuyeu
Re: IllegalStateException: Hudi File Id (...) has more than 1 pending
compactions. Hudi 0.5.3 + S3
Posted by Bhavani Sudha <bh...@gmail.com>.
If you are running inline compaction it should not cause two pending
compactions on the same file group. Along with above details, can you
please open a [SUPPORT] git issue with full stack trace and also a `ls` of
you .hoodie folder if possible?
Thanks,
Sudha
On Thu, Jun 18, 2020 at 9:57 PM Zuyeu, Anton <zu...@amazon.com.invalid>
wrote:
> Hi Team,
>
> We are trying to run incremental updates to our MoR hudi table on S3 and
> it looks like inevitably after 20-30 commits table gets corrupted. We do
> initial data import and enable incremental upserts then we verify that
> tables are readable by running:
> hive> select * from table_name _ro limit 1;
>
> but after letting incremental upserts to run for several hours , the
> mentioned above select query starts throwing exceptions like:
> Failed with exception java.io.IOException:java.lang.IllegalStateException:
> Hudi File Id (HoodieFileGroupId{partitionPath='983',
> fileId='8e9fde92-7515-4f89-a667-ce5c1087e60c-0'}) has more than 1 pending
> compactions.
>
> Checking compactions mentioned in exception message via hudi-cli, do
> indeed verifies that fileid is present in both compactions. The upsert
> settings that we use are:
> hudiOptions = Map[String,String](
> HoodieWriteConfig.TABLE_NAME → inputTableName,
> "hoodie.consistency.check.enabled"->"true",
> "hoodie.compact.inline.max.delta.commits"->"30",
> "hoodie.compact.inline"->"true",
> "hoodie.clean.automatic"->"true",
> "hoodie.cleaner.commits.retained"->"1000",
> "hoodie.keep.min.commits"->"1001",
> "hoodie.keep.max.commits"->"1050",
> DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> "MERGE_ON_READ",
> DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> primaryKeys,
> DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY ->
> classOf[ComplexKeyGenerator].getName,
> DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY
> ->"partition_val_str",
> DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> sortKeys,
> DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY → "true",
> DataSourceWriteOptions.HIVE_TABLE_OPT_KEY → inputTableName,
> DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY →
> "partition_val_str",
> DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY →
> classOf[MultiPartKeysValueExtractor].getName,
> DataSourceWriteOptions.HIVE_URL_OPT_KEY
> ->s"jdbc:hive2://$hiveServer2URI:10000"
>
> )
>
> Any suggestions on what can cause or how to possibly debug this issue
> would help a lot.
>
> Thank you,
> Anton Zuyeu
>