You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/24 10:11:43 UTC

[GitHub] [hudi] parisni opened a new issue, #5960: [SUPPORT] AWSGlueCatalogSyncClient partition AlreadyExistsException when syncing large table fro m scratch

parisni opened a new issue, #5960:
URL: https://github.com/apache/hudi/issues/5960

   hudi 0.11.1
   spark 3.2.1
   
   I have several hudi tables with > 35k partitions. When running for first time hive sync, I randomly get the bellow error which says the partition already exists which is weird because the table didn't exist yet and the partition is not duplicated in the list.
   
   As a workaround I catched the error in the AWSGlueCatalogSyncClient, but before proposing a PR, I d'like to know if this is expected.
   
   ```
   2547338 [Driver] INFO  org.apache.spark.deploy.yarn.ApplicationMaster  - Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.RuntimeException: org.apache.hudi.exception.HoodieException: Got runtime exception when hive syncing <table_name>
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:737)
   Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for table <table_name>
           at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:497)
           at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:264)
           at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:172)
           at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:159)
           ... 15 more
   Caused by: org.apache.hudi.aws.sync.HoodieGlueSyncException: Fail to add partitions to <db>.<table_name>
           at org.apache.hudi.aws.sync.AWSGlueCatalogSyncClient.addPartitionsToTable(AWSGlueCatalogSyncClient.java:145)
           at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:479)
           ... 18 more
   Caused by: org.apache.hudi.aws.sync.HoodieGlueSyncException: Fail to add partitions to<db>.<table_name>
   with error(s): [{PartitionValues: [7, 2021-07-14, 13],ErrorDetail: {ErrorCode: AlreadyExistsException,ErrorMessage: Partition already exists.}}]
           at org.apache.hudi.aws.sync.AWSGlueCatalogSyncClient.addPartitionsToTable(AWSGlueCatalogSyncClient.java:140)
           ... 19 more
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on issue #5960: [SUPPORT] AWSGlueCatalogSyncClient partition AlreadyExistsException when syncing large table from scratch

Posted by GitBox <gi...@apache.org>.
codope commented on issue #5960:
URL: https://github.com/apache/hudi/issues/5960#issuecomment-1201111260

   @parisni Any update on this issue? If it's still happening, can you please start from scratch syncing to a different database, and provide the sync tool command that you ran?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan closed issue #5960: [SUPPORT] AWSGlueCatalogSyncClient partition AlreadyExistsException when syncing large table from scratch

Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #5960: [SUPPORT] AWSGlueCatalogSyncClient partition AlreadyExistsException when syncing large table from scratch
URL: https://github.com/apache/hudi/issues/5960


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #5960: [SUPPORT] AWSGlueCatalogSyncClient partition AlreadyExistsException when syncing large table from scratch

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #5960:
URL: https://github.com/apache/hudi/issues/5960#issuecomment-1229346898

   thanks! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] parisni commented on issue #5960: [SUPPORT] AWSGlueCatalogSyncClient partition AlreadyExistsException when syncing large table from scratch

Posted by GitBox <gi...@apache.org>.
parisni commented on issue #5960:
URL: https://github.com/apache/hudi/issues/5960#issuecomment-1212279531

   I am using the sync command programmatically. Indeed the glue error happens from time to time . glue backend look not that stable, then I guess the sync process should handle those cases better othwrwize it fails and leave the glue metastore corrupted
   
   On August 1, 2022 12:04:13 PM UTC, Sagar Sumit ***@***.***> wrote:
   ***@***.*** Any update on this issue? If it's still happening, can you please start from scratch syncing to a different database, and provide the sync tool command that you ran?
   >
   >-- 
   >Reply to this email directly or view it on GitHub:
   >https://github.com/apache/hudi/issues/5960#issuecomment-1201111260
   >You are receiving this because you were mentioned.
   >
   >Message ID: ***@***.***>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on issue #5960: [SUPPORT] AWSGlueCatalogSyncClient partition AlreadyExistsException when syncing large table from scratch

Posted by GitBox <gi...@apache.org>.
yihua commented on issue #5960:
URL: https://github.com/apache/hudi/issues/5960#issuecomment-1169033862

   @parisni the exception of `Partition already exists` shouldn't happen.  Could you provide the `HiveSyncTool` command with the arguments you use for reproducing the issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org