You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hudi.apache.org by "Zuyeu, Anton" <zu...@amazon.com.INVALID> on 2020/06/19 04:57:02 UTC

IllegalStateException: Hudi File Id (...) has more than 1 pending compactions. Hudi 0.5.3 + S3

Hi Team,

We are trying to run incremental updates to our MoR hudi table on S3 and it looks like inevitably after 20-30 commits table gets corrupted. We do initial data import and enable incremental upserts then we verify that tables are readable by running:
hive> select * from table_name _ro limit 1;

but after letting incremental upserts to run for several hours , the mentioned above select query starts throwing exceptions like:
Failed with exception java.io.IOException:java.lang.IllegalStateException: Hudi File Id (HoodieFileGroupId{partitionPath='983', fileId='8e9fde92-7515-4f89-a667-ce5c1087e60c-0'}) has more than 1 pending compactions.

Checking compactions mentioned in exception message via hudi-cli, do indeed verifies that fileid is present in both compactions. The upsert settings that we use are:
        hudiOptions = Map[String,String](
          HoodieWriteConfig.TABLE_NAME → inputTableName,
          "hoodie.consistency.check.enabled"->"true",
          "hoodie.compact.inline.max.delta.commits"->"30",
          "hoodie.compact.inline"->"true",
          "hoodie.clean.automatic"->"true",
          "hoodie.cleaner.commits.retained"->"1000",
          "hoodie.keep.min.commits"->"1001",
          "hoodie.keep.max.commits"->"1050",
          DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> "MERGE_ON_READ",
          DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> primaryKeys,
          DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY -> classOf[ComplexKeyGenerator].getName,
          DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY ->"partition_val_str",
          DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> sortKeys,
          DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY → "true",
          DataSourceWriteOptions.HIVE_TABLE_OPT_KEY → inputTableName,
          DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY → "partition_val_str",
          DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY → classOf[MultiPartKeysValueExtractor].getName,
          DataSourceWriteOptions.HIVE_URL_OPT_KEY ->s"jdbc:hive2://$hiveServer2URI:10000"

        )

Any suggestions on what can cause or how to possibly debug this issue would help a lot.

Thank you,
Anton Zuyeu

Re: IllegalStateException: Hudi File Id (...) has more than 1 pending compactions. Hudi 0.5.3 + S3

Posted by Bhavani Sudha <bh...@gmail.com>.

If you are running inline compaction it should not cause two pending
compactions on the same file group. Along with above details, can you
please open a [SUPPORT] git issue with full stack trace and also a `ls` of
you .hoodie folder if possible?

Thanks,
Sudha

On Thu, Jun 18, 2020 at 9:57 PM Zuyeu, Anton <zu...@amazon.com.invalid>
wrote:

> Hi Team,
>
> We are trying to run incremental updates to our MoR hudi table on S3 and
> it looks like inevitably after 20-30 commits table gets corrupted. We do
> initial data import and enable incremental upserts then we verify that
> tables are readable by running:
> hive> select * from table_name _ro limit 1;
>
> but after letting incremental upserts to run for several hours , the
> mentioned above select query starts throwing exceptions like:
> Failed with exception java.io.IOException:java.lang.IllegalStateException:
> Hudi File Id (HoodieFileGroupId{partitionPath='983',
> fileId='8e9fde92-7515-4f89-a667-ce5c1087e60c-0'}) has more than 1 pending
> compactions.
>
> Checking compactions mentioned in exception message via hudi-cli, do
> indeed verifies that fileid is present in both compactions. The upsert
> settings that we use are:
>         hudiOptions = Map[String,String](
>           HoodieWriteConfig.TABLE_NAME → inputTableName,
>           "hoodie.consistency.check.enabled"->"true",
>           "hoodie.compact.inline.max.delta.commits"->"30",
>           "hoodie.compact.inline"->"true",
>           "hoodie.clean.automatic"->"true",
>           "hoodie.cleaner.commits.retained"->"1000",
>           "hoodie.keep.min.commits"->"1001",
>           "hoodie.keep.max.commits"->"1050",
>           DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> "MERGE_ON_READ",
>           DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> primaryKeys,
>           DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY ->
> classOf[ComplexKeyGenerator].getName,
>           DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY
> ->"partition_val_str",
>           DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> sortKeys,
>           DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY → "true",
>           DataSourceWriteOptions.HIVE_TABLE_OPT_KEY → inputTableName,
>           DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY →
> "partition_val_str",
>           DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY →
> classOf[MultiPartKeysValueExtractor].getName,
>           DataSourceWriteOptions.HIVE_URL_OPT_KEY
> ->s"jdbc:hive2://$hiveServer2URI:10000"
>
>         )
>
> Any suggestions on what can cause or how to possibly debug this issue
> would help a lot.
>
> Thank you,
> Anton Zuyeu
>